


PHRIVUIWAL ROOA tae 
GENERAL LIBRARY 


Vol Mx me CBRUARY, 1945 No. 2 


The Journal of Educational 
Psychology 


Devoted Primarily to the Scientific Study of Problems of Learning and Teaching 





ee 


CONTENTS 


The Economy of Time in Industrial Training. ........ . 65 
A. W. VANDER MEER ~ 


Comparative Speed of Joined and Unjoined Writing Strokes . . . 91 
GERTRUDE HILDRETH 


The Effect of Choice Placement on the Difficulty of Multiple-choice 
ERS L AS Oe PL tL EME PE So Ti 103 
WALTER J. MCNAMARA AND ELLIS WEITZMAN 


The Clinical Significance of IQ’s on the Revised Stanford-Binet Scale 114 
G. W. PARKYN 


Relationship between the Goodenough Drawing a Man Test and the 
1937 Revision of the Stanford-Binet Test. . . . . ..... 119 


GELOLO McHUGH 


EEE IG Ok. Ae: Me ERIE GER Re OER BPE eh 125 


$6.00 per Year * Published Monthly September to May 


WARWICK & YORK, INC. 
BALTIMORE, MD. 


Entered as Second Class Matter Nov. 15, 1921, at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879; additional entry as Second Class Matter at York, Pa. 








\ 


THE JOURNAL OF 
Educational Psychology 


Established 1910 


EDITED BY 


ack W. Dun ap 
niversity of Rochester 


IN ASSOCIATION WITH 





STEPHEN M. Corey H. H. RemmMers 
University of Chicago Purdue University 
foun G. DarLey Percivat M. Symonps 

niversity of Minnesota Teachers College (Columbia University) 
BertHaA PETERSON HarPER Pau. A. Witty 
University of Rochester Northwestern University 
Haro.p E. Jones H. E. Bucunouz 
University of California Managing Editor 


— Journat oF EpucaTionat PsycHo.Locy is devoted pri- 
marily to the scientific study of problems of learning, teaching, 
and measurement of the psychological development of the indi- 
vidual. THE JourRNAL will contain articles on the following sub- 
jects: the psychology of school subjects; experimental seadien of 
earning; the development of interests, attitudes, and personality, 
particularly as related to school adjustment; emotion, motivation, 
and character; mental development and methods. This last will 
include tests, statistical techniques, and research techniques in 
cross-sectional and developmental studies. 


Dr. Jack W. Dunlap is now on leave of absence from the 
University of Rochester and serving as Lieut. Commander in the 
U. S. Navy. For the ‘duration’, therefore, manuscripts, books 
and other materials for review, and correspondence regarding 
editorial matters should be addressed to H. E. Buchholz, Warwick 
& York, 10 E. Centre St., Baltimore, Md. 


Manuscripts should be typed and double-spaced throughout 
including quotations, footnotes, and references. Return postage 
should be included with all unsolicited manuscripts. 


THE] JouRNAL is published monthly from September to May. 
The price per year in the United States and Pan-American coun- 
tries is $6.00; $6.20 to Canada; and $6.40 to other foreign countries. 
Part-year subscriptions are 90 cents per issue ordered. Back vol- 
umes are $7.00 each, and back issues $1.10. 

Subscribers should notify the Publishers of change of address 
at least four weeks in advance of publication of issue with which 
the change is to take effect. Claims for non-receipt of an-issue will 
not be honored unless made within two weeks after receipt of next 
succeeding number. 


WARWICK AND YORK Puds/shers BAatTimoreE, Mo. 














THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume XXXVI February, 1945 Number 2 








THE ECONOMY OF TIME 
AN INDUSTRIAL TRAINING 


An EXPERIMENTAL STuDY OF THE USE oF SOUND FILMs IN THE 
TRAINING OF ENGINE LATHE OPERATORS 


A. W. VANDER MEER 
Lt.(jg) U.S.N.R. 


A number of experimental studies have shown that sound 
motion pictures are effective tools for teaching factual informa- 
tion, promoting inductive reasoning, and developing attitudes. 
How useful are films in predominately motor learning situations? 
Is the tremendous popularity of training films in the Armed 
Forces and in industry justifiable on scientific grounds? In order 
to provide a partial answer to the foregoing questions, the 
author developed a technique for using sound films in the train- 
ing of engine lathe operators; then conducted a controlled experi- 
ment to determine whether such a technique could result in a 
saving of time in the teaching of twelve lathe skills. 

The choice of engine lathe trainees was conditioned by three 
factors: (a) the problems involved in training this type of worker 
were considered to be fairly representative of those involved in 
training a large class of machine operators, (b) it was_possible 
to work with larger groups of lathe trainees than trainees on any 
other machine, and (c) U. 8. Office of Education training films 
were avilable on most lathe operations. 

The Hypotheses Tested.—The main (null) hypothesis tested was 
that prospective engine lathe operators whose instruction includes 
the carefully integrated use of training motion pictures do not 
develop the essential skills to the required level in a shorter time 
than those whose training does not include such visual aids. A 


second hypothesis was that the trainees whose instruction 
65 











—- 





66 The Journal of Educational Psychology 


_ includes motion pictures do not learn more technical information 
~ related to machine operation than do trainees whose instruction 
_ does not include such experience. 


Definition of Terms.—Two terms should be defined in the 
interests of clarity. As used in this research, the term ‘inte- 
grated film use’ excludes the type of film use involved in time and 
motion studies. Instead of being based on an analysis of errors, 
films used in this research were intended to provide the best 
possible model of the skills to be learned. Furthermore, as will 
be indicated later, the term integrated as applied to film utiliza- 
tion implies that motion pictures are used as an integral part of 
the teaching process. 

A second term that should be defined is the ‘development of 
skills to the required level.’ It is always difficult to draw a 
definite line distinguishing those who have learned the skill from 
those who have not. When the application of the skill results in 
a product which may be accurately and objectively measured, 
however, the extent to which an individual has learned that skill 
may be stated in terms of the quality of the product and the 
length of time required to produce it. Quality can be held con- 
stant when the product is of such a nature that it can logically 
be either accepted or rejected without reference to any intermedi- 
ate quality between acceptance and rejection; then the measure 
of an individual’s skill is the length of time he requires to turn 
out an acceptable product. 

The development of skills on the engine lathe may be deter- 
mined in the manner described in the foregoing paragraph, for 
when a lathe hand is given a blueprint and a piece of stock he is 
expected to turn out a job that meets certain definite specifica- 
tions within prescribed tolerances. Variations within the 
tolerances make no difference in the quality of the work. As 
long as the specifications are met, the only criterion of the 
worker’s skill is the amount of time he takes on the job. 

For the purposes of the present research, a skill is said to have 
been learned when an individual has applied it independently in 
turning out an acceptable piece of work requiring the applica- 
tion of the skill. Since in actual machine shop practice the 
amount of stock that a worker spoils by cutting off too much 
metai is also a criterion of his skill, data on the number of under- 
size pieces turned out by each group will also be compared, 











The Economy of Time in Industrial Training 67 


Previous Studies.—Although motion pictures have been widely 
used in instruction in motor skills, relatively few controlled 
experiments have been conducted on this subject. Freeman, 
Shaw and Walker! found that penmanship classes taught with 
motion pictures improved more in ‘writing position’ than did 
classes taught with a variety of other methods. No significant 
differences in quality of handwriting resulted, however. Hollis,? 
McClusky,’ and Rolfe* have experimented with the use of silent 
motion pictures in the teaching of home economics, handicrafts, 
and physics laboratory work. Their findings agree that the 
demonstration is superior to the unaided film in teaching repre- 
sentative skills in these school subjects. 

Priebe and Burton® paired twenty-six adolescent boys in terms 
of age, height, weight, leg spring, previous athletic experience, 
and natural jumping ability. The conclusions were that the use 
of films in teaching high jumps resulted in greater improvement 
and shortened the initial period of trial-and-error learning. 

The Curriculum Setting of the Experiment.—The experiment 
was conducted within the framework of a regular machine 
operator’s training program. The course was given in the Mor- 
ton High School (Cicero, Illinois) shops for new employees of the 
Amertorp Naval Ordnance Plant. The training period was 
divided into six forty-hour weeks. Classes met eight hours per 
day from 10:30 P.M. to 7:00 A.M. Monday through Friday. 

The general objectives of the training course were as follows: 

1) The development of specific skills in lathe operation. 

2) The acquisition of basic information relating to machine 
operation. 

3) The development of skill in blueprint reading and shop 
mathematics sufficient to carry on effectively in a factory 


situation. 4 
4) The development of habits and attitudes conducive to safe 


and efficient machine operation. 





1 Visual Education, A Comparative Study of Motion Pictures and Other 
Methods of Instruction. Frank N. Freeman, Editor. p. 309. Chicago: 
The University of Chicago Press, 1924. 

? Ibid. pp. 339-41. 

§ Ibid. pp. 310-34. 


‘Ibid. pp. 335-38. 
5 Priebe, Roy and Burton, William H. ‘‘The Slow Motion Picture as a 


Coaching Device,’’ School Review, xiv (March, 1939). 








68 The Journal of Educational Psychology 


There was one instructor in charge of each of three machine 
shops. All three instructors were craftsmen. None had had 
previous training in the theory or methods of teaching. The 
instructors in the beginners’ and advanced shops had been 
teaching for approximately six months at the start of the experi- 
ment. The instructor in the intermediate room had been teach- 
ing for only one month. 

The trainees were sent to the training center in groups of 
approximately fourteen. Each group spent an equal amount of 
time under the guidance of each instructor; two weeks each in 
the beginner’s shop, the intermediate shop, and the advanced 
shop. Up-grading experiences designed to help the students to 
progress toward the four general objectives were provided in each 
room. The lathe skills required became more difficult as the 
student progressed, and the tolerances allowed on jobs became 
smaller. 

The following outline indicates something of the division of 
subject-matter among the three rooms: 


1. Beginner’s Shop. 
General care and operation of the lathe. 


a. 
b. Uses and care of small tools. 
c. Lathe tool grinding. 
d. Reading the micrometer and steel rule. 
e. Truing, facing and centering work held in a chuck. 
f. Aligning lathe’centers. 
g. Rough turning between centers. 
h. Turning work of two and five diameters. 
i. Blueprint reading. 
j. Shop mathematics. 
k. Shop safety. 
2. Intermediate Shop. 
a. Turning work to five diameters. 
b. Turning a taper with tailstock set over. 
c. Turning a taper with the taper attachment. 
d. National fine threading. 
e. Turning a taper with the compound rest. 
f. Knurling. 
g. Blueprint reading. 
h. Shop mathematics. 
i. Shop safety. 





The Economy of Time in Industrial Training 69 


3. Advanced Shop. 

Use of the dial indicator and steady rest. 

Left-hand and right-hand coarse threading. 

Three wire method of thread measurement. 

Advanced care and operation of the lathe. 

Shop safety. 

Drilling and boring work held in a chuck. 

Theory and operation of milling machine, surface grinder, and 
drill press. 

Shop mathematics. 

Blueprint reading. 


Remeron oe 


The equipment was adequate. While the lathes in the 
beginner’s room were noticeably inferior to those in the inter- 
mediate and advanced rooms, this fact could not affect the 
relative standings of the groups since all groups used the same 
equipment. There were slight differences in quality of machines 
within rooms. This was compensated for by rotating the 
students’ assignments to lathes. 

The Groups Used in the Experiment.—The groups used were 
seven consecutive classes of engine lathe trainees from the Amer- 
torp Naval Ordnance Plant. No special selection was applied 
to determine which employees would come to the Morton Train- 
ing Center where the experiment was being carried on. Each of 
the first six groups was selected in its entirety as either experi- 
mental (film) group or as a control (non-film) group. In order to 
equalize the advantage of better teaching on the part of the 
instructors as the experiment went on the first, fourth and fifth 
groups were chosen as controls and the second, third and sixth 
groups were chosen as experimental groups. The seventh 
group was the only one in which the trainees were divided 
between the experimental and control groups. . 

The size of the groups ranged from eleven to seventeen. It 
would probably be safe to say that the people who served as sub- 
jects for this experiment were fairly typical of the thousands who 
were hired for training and work in war plants all over the coun- 
try. The age of the trainees ranged from eighteen to fifty-five 
years with a median of thirty. About one-third of them had had 
four years of high school, most often in a general or commercial 
course. Women outnumbered men by three to one. 





! 
t 
i" 
is 





~ 





70 The Journal of Educational Psychology 


Both experimental and control groups were told frankly that 
they were being used as subjects of an experiment and were asked 
for their codperation. The false impression of lack of importance 
frequently given to control groups in similar experimental situ- 
ations was scrupulously avoided. On the contrary, the impor- 
tance of their part in the experiment was strongly impressed 
upon all groups, and there was every evidence in the actions of 
the trainees that they realized the value of such an experiment. 
Furthermore, the spirit of competition among groups was avoided. 
While the members of various groups knew in a general way the 
significance of the time scores which they were keeping, they had 
little opportunity to compare group averages on any of the 
criterion measures. 

The Treatment of the Groups.—The essential difference in the 
treatment of the control and experimental groups was in methods 
and materials of instruction. All groups, with the exception of a 
few individuals, completed a six weeks’ training course on the 
engine lathe. The same instructors taught all groups, all groups 
worked in the same shops with the same equipment and on the 
same lathe projects. All were given the same general basic back- 
ground information in one way or another. However, with one 
set of groups, the experimental or film groups, a battery of U. S. 
Office of Education training films was used as an integral part of 
the instruction. The other set of groups, the control or non-film 
groups, viewed no films, but rather were taught with the conven- 
tional lecture-demonstration method. The experimental groups 
did not forego demonstrations entirely, but the use of films usually 
made it unnecessary for their demonstrations to be so long or so 
numerous as those of the control groups. The instructors gave a 
considerable amount of individual attention to the members of 
both groups, and in every respect except the film teaching the 
groups were taught in a similar manner. 

The Required Lathe Jobs.—Each trainee in each group was 
required to finish twelve projects which were designed to develop 
skill in what were considered essential engine lathe operations. 
The major comparisons between the groups were based upon the 
average time spent by the trainees per successful trial on each 
job. On all but two of the jobs two or more trials were required. 
Such trials were accomplished by turning the same piece of stock 
down to successively smaller dimensions. It might appear, on 








The Economy of Time in Industrial Training 71 


the surface, that under such conditions a trainee might reduce his 
time score by being, for example, .050 inch under size on his first 
trial. This would leave him only .013 inch of stock to remove for 
his second trial. However, the method of computing the average 
time per successful trial was specifically designed to prevent such 
a procedure from working to the advantage of the trainee. The 
time is figured cumulatively; therefore, the lengthened time on 
the first trial would not in the end be subtracted from any impor- 
tant score, but would in reality be added to his total average. 
This distinction will be explained more fully in the section on 
keeping the time record. 

The required lathe jobs are listed and described in Table 1, 
which also indicates which films were used in the instruction of 
the film groups. It should be pointed out that single films fre- 
quently demonstrated the skills involved in several jobs. In 
such cases, a single film lesson was all that was employed. 

All the films used were produced under the supervision of the 
U.S. Office of Education. Their contents have been described 
in a number of periodicals such as Business Screen, and need not 
be repeated here. 

The Comparisons Based on Time Scores. Keeping the Time 
Records.—The major comparisons between the two groups, as 
previously pointed out, were made in terms of the average time 
they required to finish each of the twelve lathe projects to the 
tolerances allowed. Of critical importance to the experiment, 
therefore, was the method of recording and computing the time 
spent by each trainee on each job. Two time records were kept. 
On each trainee’s drawings of his required projects was a form for 
keeping his time record. There were blanks in which to enter 
to the nearest five minutes the time when work was started and 
finished. Time taken from the job for tool sharpening, break- 
down of lathes, interruptions for lectures, and so on, was also 
recorded. Instructors kept careful check to see that trainees 
‘kept their time’ faithfully. Checking of the time record was 
made a part of checking on the acceptability of each trial of each 
job. 

A second time record was kept as a partial check on the indi- 
vidual records kept by the students. Each class had a group 
time sheet for every day of the training period. On this sheet 
each trainee entered his beginning time and ending time not only 





72 





The Journal of Educational Psychology 


for his various jobs and trials, but also for all the other activities. 
One student in each group was placed in charge of this record 
sheet and helped the instructor to see to it that time records were 
accurately and faithfully kept. 


Job 
1-L 
2-L 
3-L 
1-H 


4-H 


5-H 


6-H 


7-H 


1-V 
2-V 


3-V 


TABLE 1.—LATHE JOBS AND TRAINING FILMS 


Trials 
Tolerances Critical Al- 
Skill (# inches) Dimensions lowed 
Straight .001 .063 Reductions 4 
turning .050 inch 
Work of two .001 .063 Reductions 2 
diameters . 125 inch 
Work of five .001 .063 Reductions 4 
diameters .063 
aa .0005 .063 Reductions 4 
. 063 
Taper turn- .0005 .063 Tapers 3 
ing . 300 
.475 
.600 
Fine thread- Nut .063 National fine 3 
ing test #11, 13, 16 
Taper turn- .0005 .063 .600 Morse 1 
ing 
Taper turn- 1degree .063 30° internal 
ing 45° external 1 
Ee ee a .002 1 
chuck excentric 
Coarse .0005 . No. 6 2 
threading 
Drilling and .0005 . Increases 5 
boring .050 inch 


Film 
Rough turning 
between cen- 
ters ’ 
Turning work of 
two diameters 


‘é 


Turning a taper 
with the tail- 
stock set over 


Cutting a na- 
tional fine 
thread 

Turning a taper 
with the taper 
attachment 
and with the 
compound rest 

Height gages and 
indicators 

Cutting external 
acme thread 

Drilling, boring 
and reaming 
work held in 
chuck 


The experimenter maintained closest touch with every group 
and made it a practice to check the time records at least every 


other day. 
were rare. 


Inconsistencies and failures to enter time records 
Those that did occur were usually confined to the 


trainee’s first few days of work. Consequently, the investigator 
has a great deal of confidence in the accuracy of the time records. 

















The Economy of Time in Industrial Training 73 


Computing the Time Scores.—In stating the original hypothesis 
it was pointed out that lathe projects lend themselves admirably 
to an absolute system of acceptance or rejection. For example, 
a trainee may be asked to turn a steel rod down to one inch in 
diameter. He is told that his tolerance is .001 inch. It will 
make no difference in the quality of his work how close his finished 
piece is to one inch as long as it is no farther off than .001 inch. 
A piece measuring .999 inch is just as good as a piece measuring 
1.000 inch. By the same token, a piece measuring .998 inch can- 
not be counted ninety-nine per cent correct, for it is probably 
worthless for the purpose for which it was made. 

It is, therefore, logical to compute the individual’s time score 
in terms of how much time he required to turn out his job to 
specifications regardless of how many pieces he has ruined before 
getting it right. Since several jobs required several trials it was 
possible for an individual to turn out more than one acceptable 
specimen of a given job. In such cases the time for the several 
trials was averaged. Here is a sample record of an individual 
completing Job 1-L, which required four trials: 


First trial —100 minutes—accepted 
Second trial— 75 minutes—rejected 
Third trial — 70 minutes—accepted 
Fourth trial— 70 minutes—accepted 


His time score for the job would be the total time, 315 minutes, 
divided by the number of successful trials, (three), or 105 min- 
utes per successful trial. It will be seen that the time score is the 
average number of minutes spent by the individual per successful 
trial on any given job. The time on which the average is based 
is the cumulative total time spent, regardless of how many trials 
meet the rigidly defined and objectively measured specifications. 

Informational Gain.—In addition to the records of average time 
per successful trial on each of the twelve jobs required in the 
course of the training program, it was the intention of the investi- 
gator to collect data bearing on the gains in factual information 
made by the trainees. Accordingly the Purdue Test for Machin- 
ists and Machine Operators was given to all groups on entering 
and again on leaving the training program. Informational gains 
were computed by subtracting the entering test score from the 
leaving test score. The same form of the test was used both as 














74 The Journal of Educational Psychology 


pre-test and post-test. Such a procedure is partially justified 
by the fact that the test is an hour long and is made up of one 
hundred thirty-three items nearly all of which were unfamiliar 
to the entering trainee. The instructors were given no oppor- 
tunity to familiarize themselves with the test, so that even had 
they been biased in favor of one or the othér of the groups they 
could not have given special instructions to any group. 

Factors Considered in Equating Groups.—The factors finally 
considered in equating groups were selected from a variety 
of information gathered at the beginning of the experiment. 
Records of the routine employment interview were examined to 
determine the age, work experience, education, hobbies and sex of 
each trainee. Experience and hobbies were classified as mechan- 
ical or non-mechanical. Mechanical hobbies were listed by 6.8 
per cent of the control and 10.6 per cent of the experimental 
group. Of the control group 31.8 per cent claimed previous 
occupations classed as mechanical as compared with 34 per cent 
of the experimental group who claimed such previous occupa- 
tions as assembly work, automatic machine operation, carpentry, 
and so on. 

The first task of each entering group of trainees was to take a 
battery of tests which were intended for use in further equating 
of groups. The group tests given included: (1) the Purdue 
Industrial Training Classification Test, (2) Minnesota Paper Form 
Board, (3) The Wonderlic Personnel Test, and (4) the Purdue 
Test for Machinists and Machine Operators. Before administer- 
ing the tests the investigator described the purpose and impor- 
tance of the experiment, and explained the relation of the tests to 
the study. This preliminary talk and the order of presentation 
of the tests were identical for all groups. The whole testing time 
was approximately two and one-half hours including a ten-min- 
ute recess between taking Classification and Minnesota tests 
which were administered first, and the Personnel and Machin- 
ist’s tests which were administered last. Each test was checked 
by two independent scorers, and discrepancies in score that 
occurred were reconciled. 

Motor ability was measured by an individual test adminis- 
tered within the trainee’s first two days in the course. Identical 
directions were given each trainee taking the test. Since it was 
a performance test, care was taken to exclude onlookers. This 














The Economy of Time in Industrial Training 75 


precaution protected the test subject’s peace of mind and mini- 
mized the possibilities of some trainees ‘learning’ the test before 
taking it. 

The apparatus required for the test was designed by the 
investigator, but the original idea for the type of test may be 
credited to Hull.!. A mechanical pencil was held vertically by 
a steel rod bolted into the tool post on the compound rest of an 
engine lathe. A drawing board was bolted in position on the 
ways of the lathe. By turning the two wheels on the compound 
rest it was possible to make the pencil follow around a two-inch 
circle tacked on the drawing board. The student was instructed 
to ‘‘make the pencil follow around the circle in the shortest possi- 
ble time without getting any farther from the circle than neces- 
sary,’ and the manner of turning the wheels so that the pencil 
would follow the circle was demonstrated. 

Elimination of Certain Factors in Equation.—At the close of 
the experiment, trainees’ ages, years of education and scores on 
the tests of perception, mathematics, motor ability, and tech- 
nical information were correlated with an over-all criterion of 
speed and accuracy in lathe operation which was derived from 
the time scores on the twelve jobs. The correlation coefficients 
are shown in Table 2. Although only two of the correlations 
are positive in terms of the sign which precedes them, all but one 
indicate a positive relationship. Thus they should be inter- 
preted somewhat as follows: the higher the score an individual 


TABLE 2.—CORRELATIONS BETWEEN PROPOSED EQUATING Fac- 
TORS AND AVERAGE SIGMA CRITERION SCORE 


Equating Factor Correlation 

Minnesota Paper Form Board............... +.022 +.158 
Wonderlic Personnel Test................... —.093 +.156 
Purdue Industrial Training Classification. ..... —.349 +.139 
NE MN on hd aa ae din cic tnaa a oe Ais +.409 +.132 
Purdue Test for Machinists and Machine Opera- 

Nun: « 9 sxe see Rad ne eabitn A a Winiend & kaedoelA da —.199 +.152 
RP Ey ELC Be MMM EN eR 
EER La Spe ae pee — .069 +.157 


makes on the Classification test, for example, the lower his aver- 





1 Hull, Clark L., Aptitude Testing, p. 68, Yonkers, N.Y.: World Book Co., 
1928. 








~ 








76 The Journal of Educational Psychology 


age time per successful trial on any given job is likely to be. 
The negative signs on correlation coefficients for all but motor 
ability are interpreted in this manner. Since in the case of 
motor ability a low score indicates greater ability than a high 
score, the positive correlation with the time score indicates a 
positive relationship. The correlation between the Minnesota 
score and the time criterion score indicates a slight negative 
relationship. 

The time criterion against which these initial test scores were 
correlated was obtained by computing sigma score equivalents 
for the time scores on each of the twelve jobs required of the 
trainees. Each trainee’s array of sigma scores was averaged. 
This average was used in correlating with the initial measures. 
This procedure is justified on the assumption that these averages 
would be a more reliable measure of an individual’s speed and 
accuracy in lathe work than his time score on any single job. 

Age, years of education, and scores on the Minnesota and Per- 
sonnel tests were eliminated as factors for use in equating groups 
for comparisons of time required on the twelve lathe jobs. Since 
none of their correlation coefficients were as high as .10, it was 
felt that their relatively small value in predicting the time scores 
as computed in this research did not justify the effort required 
to consider them in equating groups for the time score compari- 
sons. However, as will be seen later, they were used in equating 
groups for the purposes of comparing their information gain. 

Not all trainees completed all jobs successfully. It was neces- 
sary to equate groups separately for each of the twelve jobs on 
which the groups were compared in terms of time scores. Table3 
shows the means and standard deviations of the groups compared 
on each job. Also shown in Table 3 are the means and standard 
deviations for the groups compared in terms of average sigma 
scores on all jobs completed. 

It will be noted that only seven of the thirty-nine comparisons 
favor the experimental group. On every comparison the control 
group had a higher mean mathematics score than did the experi- 
mental group. On eleven of the thirteen comparisons they had 
a higher mean motor ability score than the experimental group. 
It will be remembered that of all the equating factors considered, 
mathematics and motor ability correlated most highly with the 
time criterion. Thus it can be said with some assurance that 











77 


The Economy of Time in Industrial Training 


any differences found in favor of the film group are not likely 
due to superior initial status. 

Factors considered in equating groups for purposes of compar- 
ing informational gain are shown in Table 4. The film group 
included forty-three cases; the non-film group had higher mean 
scores on three of the four equating factors. The one difference 
favoring the experimental group, that of mean technical informa- 
tion score, involves less than half a raw score point. 


TaBLE 3.—EquaTinG or Groups FoR TIME COMPARISONS 














Clasthsntion| | Ses.) ' See 
Test Information Ability Number 
Test Test 
Non- Non- Non- ; Non- 
Film Film Film Film Film Film Film Film 

L-1 Mean..... .{11.032)11.977/16.712,16.636) .513 .573| 47 44 
ss ach oaata 6.565) 7.077|14.877|12.999) .904 | 1.019 

2-L. Mean...... 10. 725)12. 368/16 .050)18.132| .585 .612} 40 38 
| pe * 6.635) 6.940)15.060)12.830) .839 .991 

3-L Mean...... 10.793)12.378)16.256|)17.072| .666 .666) 41 41 
Se 6.579) 7.034)14.666)13.238) .822 .939 

1-H Mean...... 11.157|12. 757/17 .586)17.334| .626 .632) 35 35 
ciel» ode 6.454) 7.319)15.060)13.752) .823 | 1.019 

4-H Mean...... 11.878)12.526)18.068/17.660) .614 .589) 37 38 
Bes ss Ges Be 6.427) 7.220)15.838)13.452) .802 .951 

5-H Mean .|11.246)13.045/17.628)17.590) .593 .616) 39 33 
| NR 6.351] 7.345)15.006/14.220) .845 | 1.061 

6-H Mean...... 11. 263)12.306)17.394|17.778) .585 .589) 38 36 
i baxaesed 6.507| 7.098)14.794/13.550) .848 | 1.036 

7-H Mean...... 10. 962)13. 569/17 .012/18.638) .598 .735| 39 29 
SS ka a bidua 6.531] 7.012)14.710)14.422) .848 .957 

1-V Mean...... 11.044)12.419)17. 166|17.744) .658 .581| 36 37 
SD... 6.745) 7.158)15.142)13.480) .820 .976) 

2a-V Mean. ... .|10.671/12. 726/16. 100|16.048) 502 | .500| 35 | 31 
MPs 0 dhe c nad 6.381) 7.088)13.230)12.788)} .828 | 1.035 

2b-V Mean..... 11.068)13 .083)15.472)/17.556) .571 .578| 37 36 
OEE 6.664) 6.796)15.542)13.560) .774 | 1.051 

3-V Mean...... 10.412)14. 100/15. 326/18. 400; .471 - 960) 34 20 
SD..........| 6.298) 7.456)12.954:13.228) .809 . 986 

Average sigma 

score 
Mean....... .}11.603)12.425)17.526)17.350) .656 .700| 39 40 
ase den tee 6.188) 7.198)15.365)13.318) .824 | 1.030 






































78 The Journal of Educational Psychology 


TABLE 4.—EQuaALitTy or Groups FOR COMPARISON ON INFORMA- 
TIONAL GAIN 





Non-Film 


Gionp Film Group 


Equating Factor Test Scores 





Mean|} SD |Mean | SD 





Minnesota Paper Form Board..... 37.198] 8.953/35.228) 9.228 
Wonderlic Personnel Test......... 19.953) 5.111)19.000) 5.854 
Purdue Classification Test........ 11.686) 7.129)11.000) 6.528 
Purdue Test for Machinists and 

Machine Operators............. 16. 872/13 .056)17 .364/15.412 

















SUMMARY 


The experiment involved seven small groups of engine lathe 
trainees from Amertorp Naval Ordnance Plant. U. 8. Office of 
Education training films for lathe workers were made an integral 
part of the instruction of one half of the trainees. The other half 
were taught with the conventional lecture-demonstration method. 
The groups were equated on the basis of tests of motor ability, 
mathematical proficiency, and technical knowledge for purposes 
of comparing the time they required per successful trial on each 
of twelve jobs that constituted their curriculum for skill develop- 
ment. For the purpose of comparing the groups in terms of their 
gain in technical information the factors used in equating groups 
were scores on tests of technical information, mathematical 
proficiency, perceptual ability, and general intelligence. 

The Technique of Film Use.—As it is stated, the main hypothe- 
sis to be tested in the present research suggests that motion 
pictures are to be evaluated in terms of their effectiveness in 
saving time in the process of skill development. In a very real 
sense, however, neither films nor any other curriculum material 
can be evaluated apart from the methods by which they are 
utilized. Motion pictures are not used in a vacuum. Any 
evaluation of a motion picture or series of them is equally an 
evaluation of a method of film utilization. Consequently, the 
technique of film use employed in this research will be described 
in some detail. 











The Economy of Time in Industrial Training 79 


The specific methods of utilization of films varied from time 
to time in response to the many changing factors that appear in 
dealing with human beings. It is probably safe to say that no 
two film showings were exactly the same. There were, however, 
a number of general principles of film use that were followed 
throughout the experiment. 

Timing.—Ideally, film lessons should be timed to come at the 
point in the student’s progress at which they will be most helpful. 
Consequently, an effort was made to have a film on any given 
lathe operation.just before the trainees were ready to start prac- 
ticing that operation. Conversely, the class was never kept 
waiting for a film lesson when the members were ready to start 
applying it. 3 

Naturally, there was a tendency for the film lessons to be timed 
less well for the very fastest or the slowest workers. The aver- 
age, of course, set the learning pace. However, this difficulty 
could not occur at the beginning of a training period, and the fact 
that the six-week course was divided into three intervals pre- 
vented any serious difficulty in timing film lessons to suit the 
needs of all students. 

The emphasis placed upon proper timing of film lessons may 
seem to belabor the obvious. It should be noted, however, that 
there is strong temptation in the field of visual instruction to 
follow the administratively more convenient plan of lumping 
large units of film lessons together. All too often ‘visual educa- 
tion’ is accomplished by gathering together two or three hours’ 
worth of training films and showing them to all trainees regard- 
less of what machines they are operating and how much experi- 
ence they have had. In the present research no film showing 
lasted more than twenty-five minutes. 

Integration.—It has been stated previously that films used in 
this experiment formed an integral part of the training course. 
Usually the instructor demonstrated or described the next job to 
be done before the film was shown. After the film lesson the 
class was helped to make the connection between what was shown 
in the film and what was to be done in class. When the lathes 
used in the film were different from those available to the trainees, 
differences and similarities were pointed out. 

Projects required of the students were similar to those turned 
out in the film demonstration, but were sufficiently different to 

















80 The Journal of Educational Psychology 


encourage transfer of knowledge to the performance of new tasks. 
Where alternative methods were demonstrated in the films the 
relative merits of each were discussed, and the students were 
helped to select the one best suited to themselves and the equip- 
ment which they were learning to operate. 

When students required individual help in turning out their 
projects, the instructors habitually referred to the films as well as 
to any other previous teaching. In a very real sense, then, the 
film was made a part of instruction; it was never by word or 
action relegated to the status of a frill or an easily dispensed with 
supplementary device. 

The Film Presentation.—Preceding each film lesson an attempt 
was made to motivate learning, to direct attention to important 
things to look for in the film, and to do any pre-teaching that 
might be considered essential to understanding the film. 

These things were accomplished by pointing out the applica- 
tion of the skill to industry, by describing the job that the student 
was to do next, and by pointing out the various steps required in 
turning it out. The students were then shown the film for the 
first time without interruption. 

When the film was over, the students filled out a self-testing 
exercise of short answer questions which was a part of a com- 
mercially prepared study guide. A discussion followed during 
which the important parts of the film were reviewed, misconcep- 
tions were brought to light and corrected, and the self-testing 
exercises were checked. 

There followed a second showing of the film. During this 
showing the film was stopped when necessary to bring out certain 
important points. A brief summary followed the final showing. 
As has been stated, upon returning to their shop the trainees were 
given further help in making the transfer from the film to their 
own work. 

Mechanical Aspects.—All film lessons were conducted in a class- 
room rather than in the individual shops. The classroom was 
kept well ventilated and at the proper temperature as far as it 
was possible to do so. Since all the training took place after 
10:30 P.M., there was no difficulty in darkening the room ade- 





1 Visual Learning Guides, Nos. 1-10 and 44-45. Chicago: The National 
Audio Visual Council, Inc., 1942. 











The Economy of Time in Industrial Training 81 


quately. An effort was made to have the projection equipment 
set up and the film threaded before the class arrived. The 
experimenter took charge of the mechanical details of the lesson 
so that the instructor could give his complete attention to his 
class. Only on rare occasions were classes combined. In 
such cases one instructor was given direct responsibility for 
conducting the lesson. 

A spare projection lamp and a spare fuse were carried with 
the machine at all times to avoid delays due to the failure of 
equipment. ‘To minimize distraction the projector was placed 
well behind the audience. The speaker was placed on a chair 
directly in front of and below the screen. Every effort was made 
to avoid interruptions during the film showings. 

Administrative Aspects of Film Use.—Two services of an 
administrative nature were supplied by the investigator in an 
effort to assure the maintenance of such standards of film utiliza- 
tion as were described earlier. The first was that of making 
films and projection equipment conveniently available for the 
training program; the second was that of guiding the instructors 
in the application of the principles of film teaching described in 
the foregoing paragraphs. Both were considered essential to 
the valid testing of the film method. 

Time Comparisons.—In order to facilitate statistical compari- 
sons, the data gathered on each of the four experimental groups 
were combined to make a single experimental group. The four 
control groups also were combined into a single group. This 
procedure, although perhaps somewhat unusual, was necessary 
because of the small numbers of cases in the subgroups. The 
numbers in some comparisons would have been as low as five had 
the subgroups been treated separately. 

Table 5 shows the differences in mean time required per 
successful trial on each of the twelve projects that constituted 
the practice work on the lathe. It will be seen that in every 
case the film group required less time, on the average, to com- 
plete their jobs satisfactorily than did the non-film group. Ten 
of the twelve differences in average time spent per successful 
trial on single lathe jobs are more than twice their critical ratios. 
The largest odds that any of these differences are due to chance 
are found in Job 2a-V where the chances are two in one hundred. 
The critical ratio of the difference in average sigma score is 6.55, 











82 The Journal of Educational Psychology 


which is far above that which is ordinarily required for statistical 
significance. 


TABLE 5.—CoMPARISON OF FILM AND NON-FILM GROUPS IN 
MEAN NuMBER OF MINUTES REQUIRED PER SUCCESSFUL 
TRIAL AND AVERAGE NuMBER OF REJECTIONS ON 


TWELVE LATHE JOBS 
Rejec- 

Mean Criti- tions Rejec- N 

Non- Mean Differ-_ cal Non- tions Non- N 
Job Film Film ence Ratio Film Film Film Film 
1-L 163.140 121.160 41.98 1.86 2.068 2.319 44 47 
2-L 199.76 145.75 54.01 2.57 .636 .625 38 40 
3-L 361.329 264.257 97.072 2.74 1.569 1.732 41 41 
1-H 548.57 359.070 189.50 2.51 2.257 1.914 35 35 


4H 296.87 145.04 151.83 3.94 1.632 1.000 38 37 
5-H 159.660 109.120 50.54 2.57 1.303 .436 33 39 
6-H 341.44 265.03 76.41 3.66 .100 .0789 36 38 
7-H 410.362 320.142 90.22 3.44 .488 .000 29 39 
1-V 192.880 128.667 64.21 4.12 .000 .000 37 36 
2a-V 330.952 236.786 94.17 4.25 . 205 .0789 31 35 
2b-V 275.611 223.689 51.992 2.33 .0769 .0513 36 37 
3-V 77.50 63.32 14.18 1.12 1.200 1.147 20 34 

Average , 

Sigma 

Score .333 —.433 ae: Giwck weutus 40 39 


It is apparent from Table 5 that the superiority of the film 
group in time scores was not gained at the expense of accuracy. 
In only two of the twelve jobs did the film group have a greater 
average number of rejections than the non-film group. These 
jobs, 1-L and 3-L, were both done in the beginners’ room. One 
might expect smaller differences between the film and non-film 
groups at a time when they had been taught by different methods 
for so short a period as two weeks. 

It seems logical to interpret these results as meaning that sound 
motion pictures can, when used in the manner followed in this 
experiment, result in saving time in the development of skills on 
the engine lathe. 

An examination of the distributions of time spent per success- 
ful trial reveals a tendency for a higher percentage of cases in the 
film group to appear in the lower time intervals and a higher 
percentage of the non-film cases to appear in the higher time 
intervals. For example, about seventeen per cent of the film 








The Economy of Time in Industrial Training 83 


group required less than an hour per successful trial on Job 1-L 
as compared with about seven per cent in the non-film group. 
The aforementioned facts suggest a possible explanation for the 
economy of time resulting from the film method; namely, that 
the use of films reduced the period of trial-and-error activity 
typical of motor learning. In other words, the greater number 
of individuals in the film group approached their maximum 
efficiency more closely during the instructional period. 

From the data presented in the preceding paragraphs one 
might be led to‘wonder whether or not the film teaching method 
resulted in greater homogeneity in the film group. Unfortu- 
nately, the experimental data do not furnish conclusive grounds 
for answering this question, although they do tend to indicate 
an affirmative answer. Table 6 shows differences in homogeneity 
between film and non-film groups in terms of standard deviations 
on each of the twelve lathe jobs. 


TABLE 6.— DIFFERENCES IN VARIABILITY OF FILM AND NON-FILM 
Groups ON TWELVE LATHE Joss 


Num- 
Criti- ber Num- 
SD SD Differ- cal Non-_ ber 
Job Non-Film Film ences Ratio Film Film 


1-L 128.990 78.450 50.540 3.15 44 47 
2-L 132.123 69.303 63.815 3.76 38 40 
3-L 183.136 136.516 46.620 1.83 41 41 
1-H 302.392 322.347 -—19.955 .38 35 35 


4-H 101.334 53.390 48.144 3.65 38 37 
5-H 217.974 93 .402 124.572 4.32 33 39 
6-H 97.918 80.067 17.851 1.21 36 38 
7-H 115.983 93 .693 22.290 1.20 29 39 
1-V 81.390 47 .925 33.465 3.07 37 © 36 
2a-V =: 105.877 63.319 42.558 2.76 31 35 
2b-V 98 . 680 91.308 7.372 .47 36 37 
3-V 51.000 31.343 19.657 2.21 20 34 


The film group proved more homogeneous on all but one of 
these comparisons. Five of the differences were statistically 
significant since they were more than three times their standard 
error. Two others were more than twice their standard error. 
However, it will be recalled that the film group was also more 


‘ 
i 








84 The Journal of Educational Psychology 


homogeneous with respect to initial status on two of the three 
tests used for equating groups. Only in technical information 
did the non-film group exhibit less variability than the film group. 
Since the groups were equated they cannot be considered random 
samples to start with, and there is, therefore, no way of deter- 
mining the significance of initial differences in homogeneity. 

If time scores on all jobs could be averaged together, would 
greater differences in favor of the film group become apparent? 
In order to answer this question the average times per successful 
trial of each trainee in both groups were combined in a single 
distribution for each job. On the basis of these combined film 
and non-film group distributions, sigma score equivalents were 
derived for each time score. Each individual’s array of sigma 
scores were then averaged, and the film and non-film groups were 
compared in terms of these average sigma scores. The mean of 
the film group was exceeded by the mean of the non-film group 
by .766, indicating that the film group took less time per success- 
ful trial on the average of all twelve jobs when each job was given 
equal weight. The critical ratio of this difference, 6.55, was 
larger than that of the difference in means of any single job. 

It will be noted in Table 5 that in general the critical ratios of 
differences were greater for the lathe jobs that were performed in 
the latter half of the training period. These jobs, in the opinion 
of the instructors, were the most difficult, and actually did allow 
the smaller tolerances. Furthermore, the differences in number 
of rejections favored the film group more in the latter jobs. It 
seems evident that: (a) the effect of the film method was cumula- 
tive and/or (b) the film method works best when the more compli- 
cated and more exacting operations are being taught. 

Two questions of practical importance arise from the data 
showing that those taught with films learned their lathe skills in 
a shorter time than those not so taught. The first is—did not 
the time spent in showing films make up for any time saved from 
practicing on the machines? The answer is that the film les- 
sons, lasting as they did from fifty to one hundred minutes, pro- 
longed somewhat that instructional period that came before the 
trainees began actual work on the lathe. This was in spite of the 
fact that instructor demonstrations to the non-film groups 
usually had to be longer and more numerous than those given to 
the film groups. However the fact that most of the jobs required 














The Economy of Time in Industrial Training 85 


two or more trials and that the comparisons between means are 
in terms of average time per successful trial suggests that the 
actual time saving due to films was greater than the differences 
between the means would indicate. Thus the saving of approxi- 
mately forty-two minutes per trial on Job 1-L, which required 
four trials, amounts to a saving of one hundred sixty-eight min- 
utes, or more than twice the time usually used in the film lesson 
on that job. Furthermore, three of the seven lathe films demon- 
strated more than one job. Finally, it should be obvious that 
even the straight substitution of film lessons for a part of the 
time on the machines would have the advantage of making it 
possible for more people to use the same machine. 

A second question is what did the film groups do with the 
time they saved? Several answers to this question were observed 
in conducting the experiment. Some students used their spare 
time to do supplemental exercises on the lathe. Some made such 
useful shop equipment as plug gages, boring bars, and so.on. 
Other students did extra work on blueprint reading and shop 
mathematics. A few assisted the instructors in record keeping 
and in giving individual help to slower students. Several were 
put on production at the Plant during or before their last week of 
training. This latter plan seemed sensible in view of the need for 
skilled workers. A suggested method of arranging film instruc- 
tion to encourage such progress according to ability has been 
discussed by the writer elsewhere. 

Why was the motion picture able to produce such marked 
results in the economy of time in learning lathe operations? A 
number of logical explanations will suggest themselves to anyone 
who has used films in teaching motor skills. In the first place, 
film demonstrations, although they cannot entirely supplant 
classroom demonstrations by the instructor, do have a number of 
distinct advantages over them. In a shop or classroom demon- 
stration there are always those who, being on the outer fringe of 
the group, can see and hear little of what is going on. All 
trainees can see the film equally well. By manipulating lighting 
and the camera’s position in relation to what is being shown it is 
possible to make the film emphasize or play down various ele- 
ments of any demonstration. Furthermore, in the dark projec- 
tion room by far the most dominant stimuli for attracting 
attention are the bright image on the screen and the clear voice 











86 The Journal of Educational Psychology 


of the commentator. In the classroom or shop there are almost 
always distractions to compete with the instructor for the group 
attention. Even the noise of the machine that is being demon- 
strated can be cut out in the film, but in real life it may drown 
out many of the instructor’s explanatory remarks. 

Several techniques peculiar to the motion picture medium give 
the film added advantage over the instructor in demonstrating 
shop skills. The film can show minute objects greatly enlarged. 
For example, the action of the finishing tool was demonstrated in 
one film. Motions may be speeded up or slowed down as 
they were in the film demonstrating the sequence of operations 
involved in threading. Non-essential and time consuming 
elements can easily be omitted in the film presentation without 
disturbing the sequence and continuity demanded in the com- 
pletion of the operation. Diagrams and animations can show 
machinery and processes that are difficult to demonstrate in the 
absence of such techniques. 

There may be a temptation for instructors, especially those 
with little experience or those who do not particularly like their 
work, to omit or slide over all or parts of demonstrations for all 
or parts of classes. The motion picture, on the other hand, 
usually represents the best organized and most complete presen- 
tation of a technique that can be developed. It never gets tired 
or bored; it is always ready to be used if it is administered and 
cared for properly. In the U. 8. Office of Education films the 
worker nearly always was shown just how the skills he was being 
asked to learn related to the war effort and to industry. He was 
made to feel that he would soon be helping to produce airplanes, 
jeeps, or tanks, and so on. It is difficult for the instructor, 
especially if he is not particularly verbal, to attach an equally 
strong emotional appeal to the work he is demonstrating. 

It would be a serious error to infer that the motion picture can 
ever take the place of the instructor. The film does have quali- 
ties within its scope of usefulness that the instructor alone 
cannot match, but to take the maximum advantages of these 
qualities in a total training situation requires the active partici- 
pation of an ambitious and resourceful instructor. 

Comparisons of Informational Gain.—In addition to comparing 
film and non-film groups on the basis of their speed in developing 
lathe skills, comparisons were made of the two groups in terms of 











The Economy of Time in Industrial Training 87 


the information gained during their training course. The mean 
technical information gain of the film group as determined by 
entering and leaving scores on the Purdue Test for Machinists 
and Machine Operators was 38.136. The corresponding gain 
for the non-film group was 19.034. This difference is 6.61 times 
its standard error, and is therefore statistically significant. The 
fact that the average difference between pre-test and end-test 
score in the film group was slightly more than twice that of the 
non-film group is rather striking. It is consistent in trend, how- 
ever, with the findings of Arnspiger, Rulon, and a number of other 
investigators. 

Implications of the Data for the Selection of Trainees.—Insofar 
as employers of machine operators accept the speed and accuracy 
criterion of success in machine operations, the data gathered in 
the present research may have some bearing on how to select 
employees for industrial training. The lowest correlations were 
obtained between the average sigma score and age, education, 
and measures of general intelligence and perceptual ability. This 
fact suggests the advisability of reexamining these commonly used 
criteria for the selection of prospective machine operators. Age, 
education, general intelligence, and so on may be valuable as 
indices of certain desirable qualities which are ignored in the 
speed and accuracy criterion employed in this research. It 
would, therefore, be unwise to interpret this study as completely 
invalidating such measures for the selection of employees for 
machine work. 

The three factors offering the highest correlations with the 
time criterion were motor ability, mathematical skill, and tech- 
nical knowledge. The multiple correlation for these three factors 
computed by the Doolittle' method was .448. Such a correla- 
tion, though not extremely high, should enable the employer to 
select with some certainty those likely to succeed in a machine 
operators’ training course. 

Does the amount of technical information that a worker 
possesses indicate how skillful he is in actually performing lathe 
operations? In an effort to provide a partial answer to this ques- 





'C. C. Peters and E. C. Wykes, “Simplified Methods of Computing 
Regression Coefficients and Partial and Multiple Correlations,’ Journal of 
Educational Research, xx11 (May, 1931), 383-93. 





88 The Journal of Educational Psychology 


tion the end-test scores made by individuals on the Purdue Test 
for Machinists and Machine Operators were correlated with their 
average sigma scores on the twelve lathe jobs required in the 
training program. For the control group the correlation was 
—.084, and for the experimental group, —.298. It is obvious 
from these correlations that many workers who cannot answer 
test questions about machine operations correctly can, neverthe- 
less, operate the machines in a satisfactory manner. It would 
seem that too much weight should not be given the criterion of 
factual knowledge in the selection of experienced lathe operators. 


CONCLUSIONS DRAWN FROM THE DATA 


The data gathered in this study are sufficient to justify a large 
measure of confidence in the following conclusions regarding lathe 
trainees whose instruction includes the carefully integrated use 
of motion picture films as compared with originally equivalent 
groups of trainees who do not have the opportunity to learn 
through films: 

a) They require on the average less time per successful trial 
on the projects they are asked to turn out on the lathe. 

b) They do not sacrifice accuracy for speed in their work, for 
they do not on the average turn in more under-size jobs. 

c) They learn significantly more factual information on lathe 
operation. 

d) In view of (a), (b) and (c) it should be possible to shorten 
their training period by including motion pictures in their instruc- 
tion in the manner described in this study. 

The data gathered in this study justify some confidence in the 
following statements: 

a) Insofar as their learning problems are similar to those of 
lathe trainees, time can be saved in the training periods of other 
types of machine operators through the use of motion pictures as 
training aids. 

b) Films are even more effective in saving time in teaching 
the more complicated skills and those in which a greater degree 
of precision is required, than they are in teaching less complicated 
and less exacting skills. 

c) A technique of film use involving careful timing and integra- 
tion with other teaching; planned motivation, discussion, and 
evaluation in the actual film presentation; and efficient, unobtru- 








The Economy of Time in Industrial Training 89 


sive manipulation of mechanical details is an effective method of 
visual teaching. 

The data gathered did not bear out the subjective judgment 
that lathe operators trained with films would like their work bet- 
ter than those whose training did not include films. 

Incidental findings supported by the data include: 

a) The trainees rate motion pictures superior to the lecture and 
reading in “helpfulness in learning,” but inferior to teacher 
demonstrations and actual work on the machines. 

b) The best predictors of success in learning to perform lathe 
operations quickly and accurately are measures of motor ability, 
mathematical skill, and technical knowledge. 

c) Trainees consider lathe tool grinding, aligning lathe centers 
without indicators, and turning tapers with the tailstock set over 
as their most difficult learning tasks. 


SUGGESTIONS FOR FUTURE RESEARCH 


The following are suggested as promising areas for fruitful and 
useful research on the subject of visual education in industrial 
training: 

While the method used in the present research proved success- 
ful, there is no reason to feel that it cannot be improved. 
Furthermore, it required an additional man, the experimenter, in 
order to carry it on. To what extent can such a method be 
modified, made simpler and more convenient without losing its 
effectiveness? Such a question would be a fruitful one for 
research. 

The present research was carried on with groups. The whole 
question of values and methods of film teaching in individualized 
training remains to be explored. The administrative arrange- 
ments suggested in the previous paragraphs seem to have some 
promise in promoting the effective use of films in both group and 
individualized training programs. However, objective data on 
the effectiveness of such arrangements might well be gathered. 

Another problem of significance is that of the values of visual 
aids other than motion pictures. There probably is no question 
of films versus other visual aids and techniques apart from the 
specific curriculum purposes for which each is used. 

However, the psychological and curriculum values peculiar to 
the various types of visual aids have not been demonstrated 





90 The Journal of Educational Psychology 


scientifically. A related question deserving of the attention of 
research workers is the matter of what teaching jobs films them- 
selves can do best. The present research offers only a hint in 
this connection. 

What kind of concepts and skills are best taught by films, how 
do films influence various types of learners, and how do films 
influence the homogeneity of groups of learners? These are 
questions of practical importance that have not been adequately 
investigated. 

To what extent can the findings of this study which concerned 
itself with engine lathe workers be generalized to apply to the 
training of operators of other machines, such as the milling 
machine, the planer and shaper, and soon? To what extent can 
the findings of a study of the use of films in the training of 
machine operators be generalized to other trades such as ship- 
building, airplane construction, automotive and airplane engine 
repair, and so on? These are questions which seem to the 
investigator to be of critical importance, and werthy of research. 

If the present research were to be repeated, two changes might 
well be made. The first is that it might better be conducted 
with larger numbers of trainees. This should provide some- 
what greater statistical reliability. The second change would 
be to require sufficiently more trials on each job so that it would 
be possible to get an adequate notion of the nature of the learning 
curve for each lathe operation. Such a procedure would per- 
haps make it possible to discover whether the advantages 
demonstrated by the film in the present research were due only 
to an initial acceleration of the learning curve. 

If this were true, the non-film groups would eventually over- 
take the film groups. If, on the other hand, the effect of film 
teaching were cumulative, the differences favoring the film group 
would persist relatively without reference to the length of the 


training period. 








COMPARATIVE SPEED OF JOINED 
AND UNJOINED WRITING STROKES 


GERTRUDE HILDRETH 


Horace Mann-Lincoln School 
Teachers College, Columbia University 


When cursive handwriting was the prevailing style in school 
instruction there was no question of shifting to a different style 
in the middle grades. Today the situation is rapidly changing, 
for manuscript (print style) writing is being rapidly adopted as 
the standard style in the primary grades’, and from that point on 
wide variability in practice prevails. 

Some authorities recommend that children who have once 
begun manuscript writing should continue with the same style 
throughout their lifetime. Others advocate changing to cursive 
style because of the belief that manuscript writing is too slow to 
be practicable for upper-grade pupils and for daily use in adult 
life. The pros and cons of this argument have been summarized 
elsewhere’. 

Classroom teachers and remedial workers are discovering that 
large numbers of school children today tend to make their own 
decisions about the style of writing to adopt from the middle 
grades onward. In some cases the decision is made for them by 
their parents. A considerable number of children have learned 
both cursive and manuscript styles, use them interchangeably, 
and do not settle on any one style. These pupils often constitute 
handwriting problems for their teachers because of inferior 
results shown in their written work. 


THE ARGUMENT ABOUT SPEED 


Most upper-grade teachers and others who give the matter 
any thought have the idea that manuscript writing is much slower 
than cursive style writing in the upper grades and high school; 
impractical for older children for that reason. A summary of 
various experiments shows that there is little real difference 
between the two styles in writing rate when the conditions of 
instruction and amounts of practice have been equal’. 

Reasoning theoretically, one might even expect unjoined writ- 
ing to be faster. In order to check on this point the following 


experiment was conducted. Upper-grade school children were 
91 





92 The Journal of Educational Psychology 


given tests to determine the number of joined and unjoined ‘up 
and down’ pencil strokes they could make on a line in a short 
time interval. Such a test measures motor skill and manual 
dexterity similar to that required in writing, but it obviates the 
training effect due to having practiced a certain style of writing. 


THE EXPERIMENT 


The Subjects.—The subjects were seventy eighth-grade pupils, 
twenty-seven boys and forty-three girls, in an experimental 
school. They had a median chronological age of 13-0 years and 
a median IQ of approximately 123 on standard tests. The age 
distribution is shown in Table 1. 


XXXKXXKXKXKX KKK XK 


SILSTEL ELTA TT 


Fic. 1.—Illustrations for the Stroke Tests. 


TABLE 1.—CHRONOLOGICAL AGE DISTRIBUTION 


pS ee 12:0 12:1 12:2 12:3 12:4 12:5 12:6 12:7 12:8 
Number..... 1 2 1 1 3 2 5 3 3 
occ aves wi 12:9 12:10 12:11 13:0 13:1 13:2 13:3 13:4 
Number........ 5 2 7 2 5 4 3 7 
Age............ 18:5 13:6 13:7 13:8 13:9 13:10 13:11 14:0 
Number........ 3 1 5 3 0 l l 0 


The Tests—The pupil was first shown either a row of joined 
or unjoined strokes (Figure I) and was told to make a similar row 
of strokes on his paper, beginning when the stop-watch clicked 
and stopping at the final click. The same directions were given 
for the second stroke test. The examiner alternated the joined 
and unjoined stroke tests, pupils A, C, E, G, etc. being given the 
unjoined test first; pupils B, D, F, H, etc. being given the joined 
test first. This alternation was followed to eliminate practice 








Joined and Unjoined Writing Strokes 93 


effect from the first to the second test in the final results. The 
time limit for each of the two stroke tests was fifteen seconds. 
When the pupils were shown the sample rows of joined and 
unjoined strokes they were told they might slant their strokes in 
any direction that seemed most natural for them or make strokes 
straight up and down; there was no need to reproduce the slant 
shown in the sample. 

In preparation for the timing of the stroke tests and as a 
‘warming up’ device the pupil was asked to make a row of x’s 
across the test paper similar to the sample row of crosses in 
Figure 1. The timing was fifteen seconds with the stop-watch. 
Nothing was said about hurrying, but the sight and sound of the 
stop-watch tended to motivate rapid work. 

Preceding the stroke tests a two-minute sample of each pupil’s 
handwriting was taken. The directions for Ayres Measuring 
Scale for Handwriting were used for this test.* 

The pupil was given a copy of a mimeographed sheet with part 
of the Gettysburg address typed at the top. The rest of the 
sheet consisted of blank lines. The pupil first wrote his name, 
grade and the date at the top of the page. Then he was told to 
copy the typed material at his usual rate and in his accustomed 
style. About a minute was allowed for practice. Next followed 
the actual test in which the pupil wrote the same material for two 
minutes. The clicking of the stop-watch gave the beginning and 
stop signals. 

All tests were given individually in the school psychological 
laboratory. Note was taken of left-handedness. Although all 
the writing and stroke tests were done with pencil, there is every 
reason to suppose that comparable results would be obtained 
with fountain pens. The children were greatly interested in the 
tests and coéperated well. Following the tests each pupil was 
interviewed briefly about his school experiences with writing. 

In order to tie up the experiment with the children’s actual 
writing in school exercises, the examiner went over the pupil’s 
results with him indicating his scores, pointing out good and 
poor characteristics in his writing, and giving suggestions for 
improvement. 





*Leonard P. Ayres. Measuring Scale for Handwriting. New York: 
Russell Sage Foundation. 





94 The Journal of Educational Psychology 


At about the same time, a group of adults, all primary-school 
teachers most of whom had learned to do manuscript writing in 
recent years in order to instruct primary-school children, was 
given the stroke tests. The teachers were tested as a group. 
The unjoined stroke test was given first; the joined stroke test 
second. Any practice effect from the first to the second test 
would be reflected in the results of the joined-stroke test. 


RESULTS OF THE TESTS 


Joined and Unjoined Strokes.—The results of the stroke tests 
for the eighth-graders are shown in Tables 2and3. The medians 
are: Unjoined strokes, 44.3; joined strokes, 40.7. Q was the 
measure of variability computed for the distributions. For 
unjoined strokes Q proves to be 6.875; for joined strokes 10.875. 
The difference between the medians was 3.6 strokes. 

The reliability of this difference was tested by the formula for 
correlated scores: 





PEp = V PE? ca; + PE*uea, — 2r12P Ent0a, PE mea; 


The results are as follows: 


Med; Q Critical 
Un- Med; Un- Q Ratio 
joined Joined joined Joined D 


Strokes Strokes PEmea, PEmea, Strokes Strokes ri: PEp PEp 
44.3 40.7 .944 1.61 6.875 10.875 .729 1.126 3.2 


Since D is 3.2 times larger than PEp, there is considerable 
assurance that this is a true difference. 

Variability was wider in the joined-stroke test than in the 
unjoined, although the highest scores were made in the unjoined- 
stroke tests. 

The difference in joined and unjoined strokes made by boys 
and girls separately is negligible. These results were: 


Unjoined strokes: Boys................... 43.8 
reper 
pe ee 41 








95 


Writing Strokes 


njot 


Joined and U 


TABLE 2.—JoINED AND UNJOINED StrRoKE Trests—Ei1GuHTH 


GRADE PUPILS 


se701g 
peuror 





sex019g 
peurofuy) 


Cursive 
Writers 


~ 





894013g 
peuror 





s2x013g 
peurofuy 


Manu- 
script 
Writers 








1 





Pucceg peujofuy 





Wty peuror 





puoceg peuror 


35 |35 (35/28 (28 {42 (42 





+WIIY pourofuy 


35 





All 


70 


40.7 |43.75)42.5/40.5|45|45.5|40.8/43.5/41.5 


10.875 





Joined 
Girls 


43 


40.5 





Boys 


27 


41 





All 


70 


37.25 


51.00 


6.875 





Girls 


43 





Unjoined 


Boys 





3 
1 


3 


2 


27 








894013g JO “ON 


SSS383% 


44 
42 
40 
38 
36 
34 





32 


30 


28 
26 
24 
22 
20 


18 
16 
14 
12 
10 


No. 


dian | 43.8 | 44.75/44.3 


Me- 
Q: 


Qs 


Y 











96 The Journal of Educational Psychology 


Although any practice effect from the first stroke test to the 
second was eliminated by alternating the order of presentation 
in the group, there appears to be little difference in results due 
to order of presentation: 


Median unjoined test first—all pupils..... . 43.75 
Median unjoined test second—all pupils.... 45.00 
Median joined test first—all pupils........ 40.5 
Median joined test second—all pupils...... 42.5 


Negligible differences are found between the number ofs trokes, 
joined and unjoined, made by manuscript writers on the one 
hand and the cursive writers on the other. 


Median for manuscript writers—unjoined strokes......... 45.5 
Median for cursive writers—unjoined strokes............ 43.5 
Median for manuscript writers—joined strokes........... 40.8 
Median for cursive writers—joined strokes............... 41.5 


In view of the greater practice manuscript writers have had in 
making unjoined letters, and the cursive writers have had in 
making joined letters, one might have anticipated more differ- 
ence. However, several manuscript writers among the group 
appeared to work laboriously in making joined strokes. The 
correlation between joined and unjoined strokes computed by 
the Pearson product moments formula proved to be 


r= .729 + .04 


The correlation between speed of strokes, joined and unjoined 
combined, and speed of writing measured by the Ayres test 
proved to be 


r= .544 + .06 


The factors that tend to reduce the correlation between the 
two stroke tests are chiefly the brevity of the test, differences in 
warming up, and individual idiosyncrasies in the children. 

The question might be raised whether fifteen seconds’ time 
yielded adequate samples of each pupil’s true capacity to make 
joined and unjoined strokes. Results show that the rows of 
unjoined strokes tended to average about five inches in length; 
joined strokes about an inch longer. A considerable number of 





Joined and Unjoined Writing Strokes 97 


children required more than the width of a full line, seven inches, 
for the stroke tests. Tohave allowed much more time might have 
introduced fatigue. It was impractical to attempt to give addi- 
tional series of stroke tests to each child. 


ADULT TEST RESULTS 


The median number of unjoined strokes made by the adult 
group was 50.73; the median for joined strokes, 51.00—a negligi- 
ble difference. The range in the number of strokes individuals 
made was wider in the joined-stroke test. These results are 
shown in Table 3. The long years of practice the adults have 


TABLE 3.—WRITING STROKE TESTS—ADULTS 


Number of strokes........ 70 68 66 64 62 60 58 56 54 52 50 
Unjoined strokes......... 2 2 2s t ae 
Joined strokes........... 9 Spa en ee l 2 4 
Number of strokes........ 48 46 44 42 40 38 36 34 32 Total 
Unjoined strokes......... 2s < a ae a ee 
Joined strokes........... eS: 3-2 2 28 


Median: Unjoined Strokes; 50.73; Joined Strokes, 51.00. 


had in cursive style writing no doubt account for the negligible 
difference found in making joined and unjoined strokes as com- 
pared with the children’s results. However, the number of 
cases is too small to permit final conclusions on this point. 


RESULTS OF THE AYRES WRITING TEST 


As a group, these eighth-grade pupils acquitted themselves 
well on the Ayres Test. Results are shown in Table 4. 

There were twenty-eight who wrote in manuscript style, forty 
who tended to join all their letters, and two who joined more 
often than not. The latter two were classed with the cursive 
style writers. Only four of the twenty-eight manuscript writers 
were boys. 

The median speed (letters a minute) written by the manu- 
script writers was 80.17; by the cursive writers 85.5. Q for the 
manuscript writers was 13.5 letters; for cursive writers 9.79. 

The Ayres norms for cursive writing are as follows: Grade 
VII—seventy-six letters a minute; Grade VIII—seventy-nine 
letters a minute. Seventy-seven letters a minute may be con- 
sidered ‘normal’ for the beginning of Grade VIII. 














The Journal of Educational Psychology 


TABLE 4.—SpPEED OF WrITING—AyYRES TEST 








No. of 

letters 
in a 

minute 


Manuscript 


Cursive 


By Age Groups 





Boys 


Girls 


All 


Boys 


Girls | All 


12-0 


12-6 





114 
112 
110 
108 
106 
104 
102 
100 
98 
96 
94 
92 
90 
88 
86 
84 
82 
80 
78 
76 
74 
72 
70 
68 
66 
64 
62 
60 
58 
56 
54 
52 
50 
48 
46 
44 
No. 
Median 
Q: 
Qs 
Q 








1 


1 


24 


1 


— — eet 


m bO 








-— C= es ee 


_ 


23 


—— 


me 


— bd 
— ee CO DD CO 


19 /42 

85.5 
75.17 
94.75 











mm bo 


10 
80.5 





1 


tw bw 


Nowe = & dO dO 


25 
86 





13-0 


—_— ——- 


— 


= = No bo bd 


a 


24 
89.5 





13-6 


_— 


11 
78.5 

















Joined and Unjoined Writing Strokes 99 


As a group, these pupils wrote faster than typical eighth- 
graders in comparison with the Ayres test norms for the begin- 
ning of Grade VIII, regardless of whether they wrote in 
manuscript or cursive style. 

Of those writing more than one hundred letters a minute, six 
were manuscript writers and eight wrote cursive style—about 
the same proportion as the total number of pupils who used each 
style. 

By age groups the medians for speed of writing were: 12-0, 
80.5; 12-6, 86; 13-0, 89.5; 13-6, 78.5. The poor showing of the 
pupils in the 13-6 age group is due to the fact that these pupils 
tend to be relatively slower in general than younger pupils in the 
same grade. 

The Ayres norms for speed of writing indicate that throughout 
the age range, Grade I through Grade VIII, there is a correlation 
between age and rate, an indication that learning to write is 
developmental in character. 

In both the writing and stroke tests wide range in speed among 
the group was shown. The range in the Ayres test was from 
forty-five to one hundred fifteen letters a minute, the fastest 
writer being 2.55 times as rapid as the slowest writer. The num- 
ber of unjoined strokes made by the slowest child was eighteen; 
by the fastest, sixty-nine, a ratio of 3.9 to 1. Nearly as wide a 
range was shown in the joined-stroke tests. Differences in age, 
training, and experience contribute to variability within the 
group, but there are also psycho-physical factors residing in 
make-up of the individual pupils that undoubtedly affect the 
speed of movement. Speed of writing seemed to have more rela- 
tion to speed of movement in general as a personal trait of the 
writer than to the style of handwriting adopted. 

All seventh-grade pupils in the school, sixty-eight in number, 
were given the same writing test. Half were manuscript 
writers, half cursive style writers. The median speed for the 
first group was 70.5 letters a minute; for the second, 70 letters. 

The fastest workers on all the tests were those who showed 
decided rhythmical tendencies, a tendency to write letters 
within words and strokes in groups: ‘tat-a-tat-tat,’ rather than 
‘tat-tat-tat.’ The rhythmical workers made brief pauses regu- 
larly throughout the tests. 

During observations of the children at work and from a study 
of their test results, it was interesting to note that the brief 











100 The Journal of Educational Psychology 


tests of making strokes and crosses tended to be symptomatic of 
the child’s total writing behavior. 


QUALITY OF WRITING 


The cursive writers were normal in quality according to the 
Ayres standards. Those with best quality had developed a 
style for practical utility rather than ‘perfect penmanship.’ 
Several who wrote at high speed achieved good quality. A few 
showed very poor quality, rating little better than ‘third or 
fourth grade’ in their own estimate. The chief defects were 
uneven touch and pressure that resulted in writing that was too 
light to read easily, poor letter formation, uneven, irregular 
alignment, inconsistent slant. Left-handedness interferred with 
the writing speed and quality of several children; a number were 
tense and visibly under strain while writing; and several adopted 
a posture that was inimical to efficient writing habits. 

There were no comparable standards for rating the quality of 
writing done by the manuscript writers. In general their writing 
was superior in legibility to that of the cursive writers. Letters 
were neatly made and evenly spaced, there was good spacing 
between words, direction of strokes was consistent and align- 
ment was even. The manuscript writers appeared to work as 
smoothly as the cursive style writers and with no more effort. 
The products of the slower manuscript writers usually showed 
infantile characteristics such as poor spacing between letters and 
words, variations in the style of the letters, tendency to jab the 
paper, uneven pressure and the like. 

The poorest writers proved to be ‘change-over’ cases who had 
had insufficient drill in cursive script writing to establish good 
habits. Within the same writing sample they frequently showed 
combinations of the two styles. 

Most of the cursive script writers were children from the public 
schools who had received systematic handwriting instruction 
through the upper grades. Judging from the details of their 
writing histories no more than half the manuscript writers had 
been given writing instruction beyond the third grade. 

If the difference in variability found between speed of writing 
cursive and manuscript styles is reliable, these results would sug- 
gest that the wider variability in speed of writing manuscript 





Joined and Unjoined Writing Strokes 101 


style reflects less uniformity within the group in kind and amount 
of instruction. 

Some of the children told about the great struggle they had 
learning to write. Several left-handed children, half changed 
over, and those who had fluctuated between manuscript and 
cursive style writing had experienced the most difficulty. One 
child had not only been shifted from left to right hand, but he 
had been required to change his style of writing at the same time. 


CONCLUSIONS AND RECOMMENDATIONS 


Since the test results show that making unjoined strokes is 
faster for older children than making joined strokes, it is reason- 
able to infer that with equal amounts of practice and equally 
thorough teaching throughout the elementary-school grades 
manuscript writing can be as fast as, if not faster than, joined- 
letter writing. 

Of course neither manuscript nor cursive writing consists 
merely of joined or unjoined up and down strokes. Letters such 
as ‘s,’ ‘e,’ ‘f,’ and ‘z’ require horizontal strokes in manuscript 
style writing. It is these letters requiring sharp angles that Gray? 
found to be more time-consuming compared with the correspond- 
ing letters in cursive style writing. The remaining letters of the 
alphabet, however, require similar direction of strokes in the two 
styles. 

We are not concerned solely with the question as to whether 
one style is faster than the other, but must inquire: (1) How fast 
can handwriting that employs the Roman alphabet be done and 
still be highly legible? (2) Can the standards that are set up 
for speed and quality be attained with manuscript writing? 

These results suggest that manuscript writing can be fast 
enough in the upper grades for all practical purposes and that 
children who first learn manuscript writing in the primary grades 
would do well to continue in that style. 

In the busy world today business people, ticket agents, sales- 
people and others are required to ‘print’ the papers they fill out. 
Manuscript writing, which is superior in all ways to ‘printing’ in 
capital letter style, meets the requirement for high legibility and 
is sufficiently rapid. 

To achieve economy in learning it is recommended that all 
children who are to learn to read and write material employing 


: 
{ 
| 
: 
/ 
{ 
\ 





oe Se 


— sa 

















102 The Journal of Educational Psychology 


the Roman alphabet be taught manuscript writing. Then the 

material they write by hand and on the typewriter will correspond 

with the handwritten and printed material they read. 
Additional research is recommended to verify the findings 


reported here. 


SUMMARY 


Seventy eighth-grade children in an experimental school were 
given a series of tests: making series of joined and unjoined ver- 
tical strokes horizontally across the paper as in handwriting, and 
the Ayres handwriting tests. 

A substantial difference was found between rate of joined and 
unjoined strokes in favor of the latter. 

Only a negligible difference was found in rate on the stroke 
tests between manuscript and cursive writers. 

A group of twenty-eight primary-school teachers showed a 
negligible difference in rate on the two stroke tests. 

On the Ayres handwriting test both cursive and manuscript 
writers rated on the average above the Ayres norms for cursive 
writing at the beginning of Grade VIII. However, the cursive 
writers were slightly faster than the manuscript writers, a differ- 
ence that can be attributed to differences in amounts of practice. 

Because of the analogy between the task of making joined and 
unjoined writing strokes, and cursive and manuscript writing, 
respectively, the latter may be assumed to be as rapid as, if not 
more rapid than, cursive style writing, provided amounts of prac- 
tice and instructional emphasis are equivalent. 


REFERENCES 


1) Beale, Beulah. ‘Trends in Handwriting.”’ Baltimore Bulletin of 
Education 1944, xx11, 29-32. 

2) Gray, William Henry. ‘‘An Experimental Comparison of the 
Movements in Manuscript and Cursive Writing.” Journal of Educa- 
tional Psychology, 1930, xx1, 259-72. 

3) Hildreth, Gertrude. ‘‘Should Manuscript Writing be Continued 
in the Upper Grades?” Elementary School Journal, 1944, xiv, 85-93. 








THE EFFECT OF CHOICE PLACEMENT 
ON THE DIFFICULTY 
OF MULTIPLE-CHOICE QUESTIONS* 


LT. WALTER J. McNAMARA, U.S.N.R. 
AND 
LT. ELLIS WEITZMAN, U.S.N.R. 


Central Examining Board, Naval Air Training Command, Pensacola, Florida 


The belief is generally held that the chance, or ‘guess,’ element 
in five-choice and four-choice objective test questions is one in 
five for the former and one in four for the latter. Those engaged 
in the construction of this type of examination usually proceed 
~ upon the assumption that an entirely naive subject should be 
expected to obtain a score of twenty-five per cent on an examina- 
tion consisting of four-choice items, or twenty per cent on an 
examination composed of five-choice items. Concomitant with 
this belief, the viewpoint is generally held that the correct choices 
should be scattered among the four or five positions in such a way 
that the subject is not able to discover any particular pattern 
which might guide him to score higher than his understanding 
of the subject-matter warrants. The possibility that the place- 
ment of the correct choice in any one of the four or five possible 
positions has a definite and mensurable effect upon the difficulty 
of the question has not, to the knowledge of the writers, been 
subjected to serious investigation. 

The writers were led into the present study by the feeling that 
not only might such a position factor exist, but that it might 
conceivably be present to an extent sufficient to make a significant 
difference in the results obtained by the questions. It also 
appeared obvious that large numbers of test items, administered 
to large numbers of subjects, would be desirable in order to ‘bal- 
ance out’ any variations which might be inherent in any par- 
ticular set of questions or subjects. Fortunately, both were 
available as a result of data gathered in the extensive examining 
program conducted by the Central Examining Board among 
naval aviation cadets. 





* The opinions or assertions contained herein are the private ones of the 
writers and are not to be construed as official or reflecting the views of the 
Navy Department or the naval service at large. 

103 





; 
} 
| 
{ 
; 
| 





104 The Journal of Educational Psychology 


DERIVATION OF THE DATA 


The test items selected for this study were all either four- or 
five-choice items, consisting of an incomplete statement. which 
might be completed to make a true statement by correctly select- 
ing one of the four or five phrases following it. 

The following are samples of the questions used in the tests: 


A pilot watches the following sequence of clouds overhead: 
cirrus, cirrostratus, altocumulus, nimbostratus, cumulonim- 
bus. He should know that a 


1—warm front is approaching, with air stable. 
2—fast-moving cold front is approaching. 

3—warm front is approaching, with air unstable. 
4—-slow-moving cold front is approaching, with air 
unstable. 


A pilot pushes the stick forward quickly to recover from a 
stall. This will immediately cause the airplane to 


1—recover with the least loss in altitude. 
2—regain its angle of maximum lift. 
3—level off. 

4—-go into an uncontrollable spin. 
5—lose altitude due to loss of lift. 


Test items had been developed to cover the subject-matter 
fields of Mathematics, Physics, Principles of Flying, Aerology, 
and Operation of Aircraft Engines. These five courses were 
taught as part of the initial ground school training of naval avi- 
ation cadets in the then twenty Navy Flight Preparatory Schools. 
Uniform weekly and final examinations had been provided for 
each of these subjects by the Board.' Each examination was 
composed of forty questions, this number being used to result in 
easy scoring, since the Navy employes 4.0 as a perfect score (with 
2.5 required for passing) as its basic system. 

The subjects, naval aviation cadets, were males in their late 
teens or early twenties. In order to qualify as aviation cadets 
they had been required to pass through a careful screening 





1E. Weitzman, and R. C. Bedell. ‘‘The Central Examining Board for 
the Training of Naval Air Cadets.”’ Psychol. Bull. 1944, 41, 57-59. 








Effect of Choice Placement on Difficulty 105 


process as to educational background, mental ability, mechanical 
aptitude, and physical condition, to mention some of the chief 
bases of selection. In general, it may be stated that they formed 
a highly selected group of young men of much better than aver- 
age mental and physical endowment. 

For this investigation, the writers selected 8,692 multiple-choice 
questions of which 4,774 had five choices and 3,918 had four 
choices. Each of the tests from which these items were drawn 
was composed of either four-choice or five-choice items, since in 
no single test were both types of questions used. These test 
questions had been administered to a total of 664,088 subjects. 
The number of subjects for the five-choice items was 364,705, 
and for the four-choice items was 299,383. Because each avi- 
ation cadet had taken many of the different examinations 
involved in the study (weekly tests in each of the five courses, 
plus final examinations), the number of subjects stated does not 
indicate that number of aviation cadets. Since, however, there 
was only one answer sheet for each cadet for each test, the num- 
ber of subjects for the test questions analyzed may be regarded, 
legitimately for the purposes of this study, as that given above. 


TABLE 1.—FREQUENCY WITH WuicH EaAcu Position Was Usep 
FOR THE CORRECT CHOICE 











4-Choice Items 5-Choice Items 
Choice 
No. Per Cent No. Per Cent 
l 969 24.7 1008 21.1 
2 976 24.9 932 19.5 
3 1038 26.5 1032 21.6 
4 935 23.9 981 ~ 20.5 
5 821 17.2 
Total: 3918 4774 

















In the original construction of the test items, no restriction 
had been placed upon the eighteen or twenty test constructors as 
to the placement of the correct choice, other than the caution 
that a fairly equal number of correct choices in each test should 
be placed in the four or five available positions. Table 1 shows 











106 The Journal of Educational Psychology 


the number and percentage of correct choices placed in each of 
the positions in the test items studied. 

It is interesting to note that when given wide latitude in the 
placement of correct choices in multiple-choice test questions 
there appears to be a slight tendency to prefer the third position 
in both four- and five-choice items, whereas the smallest percent- 
age is in both instances found in the last position. 

An item analysis had been made for each of the questions, 
following techniques previously presented.! Analysis data indi- 


TABLE 2.—PERCENTAGE DIFFICULTY OF THE VARIOUS POSITIONS 
FOR Two SUBJECT MATTER GROUPS 











Math, Physics, Flying Aerology, Engines 

ni Percentage Percentage 
Position 

4-Choice 5-Choice 4-Choice 5-Choice 
1 76.6 77.0 78.2 75.1 
2 75.5 76.5 78.1 77.9 
3 74.3 76.4 75.3 78.2 
4 74.8 74.7 80.0 76.6 
5 75.9 76.4 

















cate the percentage of cadets selecting each of the choices as cor- 
rect. Items were studied first for the subject-matter fields 
Mathematics, Physics, aiid Principles of Flying; then for the 
subject-matter fields Aerology and Airplane Engines. Presented 
in Table 2 are the percentages selecting the correct choice when 
it appeared in each of the four or five positions. The data for 
four-choice questions in both subject-matter groups show a slight 
increase in difficulty from the first through the third position, 
then a decrease in difficulty for the fourth or last position. 

Data for the five-choice items are not similar for the two sub- 
ject-matter groups. For the Mathematics, Physics, and Flying 
_ group, there is a consistent increase in difficulty from the first 





1E. Weitzman and W. J. McNamara. ‘‘Techniques Used in Analyzing 
the Learning Achievement of Naval Aviation Cadets.”” J. educ. Psychol., 
1944, 35, 181-185. 











Effect of Choice Placement on Difficulty 107 


through the fourth position, with a decrease in difficulty for the 
last position. The data for the Aerology and Engines group, 
however, differed in that the third position was the easiest, the 
first position most difficult, and the last position slightly more 
difficult than the next-to-the-last position. 

The data for all subjects were then combined, resulting in the 
position differences presented in Table 3. Table 4 gives the per- 
centage differences between the various positions in the two types 
of questions, as well as the critical ratios for these differences. It 
is seen that when all data are combined the differences are small 
but statistically significant in twelve of the sixteen pairs of 
comparisons. 


TABLE 3.—THE PERCENTAGES OF SUBJECTS SELECTING THE 
Correct CHOICE WHEN It APPEARS IN A CERTAIN 











POSITION 
4-Choice Questions 5-Choice Questions 
Vissi Number of Percentage Number of Percentage 
Subjects 6° | Subjects S 
l 74,993 * 77.4 76,871 76.3 
2 74,811 76.6 71,843 77.1 
3 77,566 74.7 79,555 77.1 
4 72,013 77.0 73,970 75.4 
5 62,466 76.1 
Total: 299,383 364,705 

















* Number of test items in each position given in Table 1. 


For four-choice items, it is seen that there is an increase in 
difficulty as one goes from the first to the third choice, followed 
by a statistically significant decrease in difficulty for the fourth 
or last position. The differences in difficulty between choices 
one and two, one and three, and two and three are all statistically 
significant. The difference in difficulty between positions three 
and four is also a statistically significant one. The differences in 
difficulty between positions one and four, and two and four, are 
not significantly great. It appears, then, that as one proceeds 

















108 The Journal of Educational Psychology 


down the list of choices each is significantly more difficult than 
the preceding until the final choice which is not significantly 
harder than the first choice in the list. 


TABLE 4.—PERCENTAGE DIFFERENCES BETWEEN THE DIFFICULTY 
LEVELS FOR THE VARIOUS POSITIONS AND THE 
CORRESPONDING CR’s* 














Correct- 4-Choice Items 5-Choice Items 
Choice 
Posi- 
tion 1 2 3 4 1 2 3 4 5 
1 
2 0.8(3.7) 0.8(3.7) 
3 2.7(12.4) | 1.9(8.6) 0.8(3.7) (0.0) 
4 0.4(1.8) 0.4(1.8) | 2.3(10.4) 0.9(4.1) | 1.7(7.6) | 1.7(7.8) 
5 0.2(0.9) | 1.0(4.3) | 1.0(4.4) | 0.7(3.0) 





























* CR’s appear in parentheses next to the differences. 


Turning to the five-choice items, a somewhat different trend 
exists in that choices two and three show less difficulty than the 
first. The fourth, or next-to-the-last choice, is found to be (like 
the next-to-the-last choice in four-choice items) significantly 
more difficult than any other position in the list. 

No difference appears between the relative difficulty of posi- 
tions two and three in five-choice items; and these two positions 
are both significantly less difficult than any of the other positions 
in the list. 


CHECKING THE VALIDITY OF THE FINDINGS 


As a check on the validity of the above findings, a study was 
made, with a much smaller number of cases, to see if it hoids true 
that ‘moving’ the position of the correct choice will vary the 
difficulty level of the questions. This procedure should insure 
that the data had not resulted from any tendency on the part of 
those constructing the tests to place the more difficult correct 
choices in a particular spot, despite the fact that such did not 
appear to be the case. 

Such a check could readily be made because the tests used by 
the Board are all prepared in two equivalent forms which are 


Effect of Choice Placement on Difficulty 109 


alternated among the cadets during test administration. The 
two forms consist of the same questions arranged in different 
orders of questions and choices. Thus, for a question which 
had been utilized in the data above there were obtainable data 
for the difficulty when the question itself appears in a different 
part of the examination, with the correct choice also appearing 
in a different position in the list of choices. Since the rearranged 
item had been administered to alternate cadets at the same time 
under the identical testing conditions, an ideal check was pro- 
vided. The writers reason that if moving a correct choice from 
the penultimate position—the most difficult according to the 
findings— would in itself decrease the difficulty of that test ques- 
tion, then the data brought to light should tend to substantiate 
the findings. 

For this purpose, six forty-item tests, two each in Mathematics, 
Physics, and Principles of Flying, were employed for checking 
the findings on five-choice items. The number of subjects in 
this instance was 18,042. When the correct choice had been 
moved from position four to any of the other four positions, 
the percentage selecting the correct choice tended to increase. 
The average difficulty when the correct choice was in position 
4 was 76.5 per cent. When these identical items were rearranged 
so that the correct choice was position 1, 2, 3, or 5, then the aver- 
age percentage selecting the correct answer increased to 78.0 per 
cent. 

For checking four-choice items, eight forty-item tests in 
Mathematics, Aircraft Engines, and Aerology were employed. 
These items had been administered to 23,183 subjects. With 
this number of items and subjects, the relative difficulty of the 
positions was noted. The difficulty of the next-to-the-last posi- 
tion was greatest, with 74.8 per cent selecting the correct. response. 
When the correct choice had been moved from the third position 
to any of the other three, the percentage selecting the correct 
choice increased to an average of 75.8 per cent. 

In brief, the tendency of certain positions to be more difficult 
than others was noted when smaller numbers of test questions 
and subjects were used. Of even. greater significance in this 
connection, it showed that the position factor was not due to any 
test-construction element or statistical artifact, but was inherent 
in the position within the test items analyzed. 





; 
| 
| 
{ 
: 
; 











The Journal of Educational Psychology 


DISCUSSION OF THE FINDINGS 


From the above, it seems to be a safe conclusion that a slight 
tendency exists to select certain positions in a list of choices in 
preference to others. It follows from this that the difficulty 
level of a multiple-choice test item is influenced somewhat by 
the placement of the correct choice. 

Any analysis of learning achievement as measured by large 
numbers of such questions administered to large numbers of 
subjects should include consideration of the choice-placement 
elements since this affects the results obtained. By the same 
token, those engaged in the preparation of test questions for 
such a program can make slight adjustments in item difficulty 
simply by correct-choice placement, thus obviating the necessity 
to do so through the more time-consuming and less easily con- 
trolled method generally employed—that of revising one or more 
of the choices. It is, further, worth commenting upon the fact 
y/that understanding of the idea covered in a question is not the 
only element at work in selection of the correct answer. Even 
though the position factor be relatively small, it is not only 
significant statistically, but it has significance in terms of how 
people react to the psychological conditions resulting from the 
test situation. 

It is quite true that although a slight tendency to select in 
terms of choice position has been found to exist, the data in 
themselves present no explanation for this tendency. Certain 
hypotheses, nevertheless, present themselves as more likely than 
others. The next-to-the-last position in the series, for example, 
is assuredly not the most conspicuous in its physical location. 
Both the first and last items in a list are definitely more con- 
spicuous than those in between. This is in keeping also with 
psychological findings on proactive and retroactive inhibitions 
to learning in studies of both meaningful and nonsense material. 
This line of reasoning serves at least to narrow down the most 
difficult locations to the middle two for four-choice items, and 
to the inner three positions for five-choice items. Since the 
inner choices are less noticeable, they are not as likely to be con- 
sidered as carefully as those at either end—thus are less likely of 
selection. 








Effect of Choice Placement on Difficulty 111 


Another possible explanation lies in the fact that a person who 
has not made a selection until arriving at the ultimate choice is, 
perhaps, more likely to select the last choice than to go through 
the list of choices another time. The possible confusion of 
thought resulting from perusal of several wrong answers cer- 
tainly cannot be a factor at play in this case, since the last choice 
would be subject to more proactive effects than the penultimate. 
But proactive and retroactive influences are not sole determiners, 
since in five-choice items these would be strongest for the third 
rather than the fourth position, nor do they explain the greater 
difficulty of position one, when compared with two and three, in 
the five-choice questions. 


As a by-product of the data, it is apparent that any confusion ; 


resulting from reading wrong answers has no effect upon the 
knowledge evidenced by the student. This is mentioned because 
the viewpoint has often been presented that multiple-choice 
questions may have harmful pedagogic results in that the student 
is forced to read three or four incorrect statements while respond- 
ing to such a test question. Such a view is not likely to hold up 
when we consider that recognition of the correct choice is slightly 


better when it appears after several incorrect ideas have been — 


presented than is the case before any have been read. Here is 
evidence of a tangible sort which may be used in combating the 
view, rather commonly held, that multiple-choice questions have 
a detrimental and confusing effect upon the subject’s knowledge 
as a consequence of the incorrect concepts presented. 

It is to be noted that all cases used in this study were highly 
selected males. Although the writers can, therefore, speak with 
positive assurance only in terms of the particular population sur- 
veyed, there appears to be no reason to believe that the results 
would be different had female subjects, or less highly selected 
groups of mixed subjects, been surveyed. No evidencé of sex 
differences in reacting to this type of question has ever been 
presented. 

Despite the fact that the present study has dealt exclusively 
with achievement testing, it may very well be true that other 
types of test employing this kind of question—e.g., personality 
and attitude inventories—show findings due in some measure to 
this position factor. A personality test question calling, for 





4 
i 
{ 
} 
. 
} 
: 
. 











112 The Journal of Educational Psychology 


example, for a response to the choices ‘Frequently,’ ‘Occasionally,’ 
‘Rarely,’ ‘Never,’ might reveal a slight tendency to select ‘Never’ 
in preference to ‘Rarely.’ This tendency might very well be due 
to a selection tendency rather than a difference in behavior or 
attitude relative to the circumstances given in the question. 
Conversely, it follows that those showing a slight preference to 
select the third choice in preference to the fourth might actually 
possess a behaviorally significant preference which has been 
masked by the position factor. In short, statistically insignifi- 
cant trends found in other than achievement test items—par- 
ticularly borderline instances—might be forced into more definite 
categories of significance-insignificance by virtue of taking the 
position factor into account. 

In conclusion, the writers wish to point out that although 
statistically significant differences have been found to exist with 
respect to the placement of choices in multiple-choice test items, 
these found differences are relatively small. Whether or not 
these differences have practical significance in an examining 
program is a function of the scope of the program and the degree 
of exactness required for the particular purposes involved. On 
the other hand, it might be stated even more positively that the 
existence of such a behavior tendency (to a statistically signifi- 
cant degree) in the testing situation should be a datum of informa- 
tion of considerable interest, and even importance, from the 
viewpoint of psychology, regardless of its relative size in terms of 
test-item proportions. 


CONCLUSIONS 


Bearing in mind the particular nature of the study with respect 
to subjects and test items, the following conclusions have been 
reached by the writers: 

1) The placement of choices in the several possible positions 
in four- and five-choice questions has some effect upon the diffi- 
culty level of the questions. 

2) In both types of test questions the penultimate, or next-to- 
the-last, position is the one having the greatest difficulty level. 
It is in both cases more difficult than any other position to a 
statistically significant degree. 

3) In four-choice questions, the difficulty level increases from 
the first through the third position, with the fourth position show- 











Effect of Choice Placement on Difficulty 113 


ing a decrease in difficulty level which places it at a level approxi- 
mately equal to that of the first. 

4) In five-choice questions, the second and third positions are 
less difficult than the first; and the fourth position is most diffi- 
cult; and the fifth position is not significantly more difficult than 
the first. 

5) The position factor found to exist in achievement test items 
may also be present in other types of tests employing multiple- 
choice questions. 

6) The difftculty level of a multiple-choice test item may be 
influenced to a certain extent by the placement of the correct 
choice in different positions in the list. 

7) Since the effect of correct-choice placement on the difficulty 
level of multiple-choice test questions is so slight that only a small 
difference is noted even when large numbers of cases are employed, 
it might be said that this factor is important (from a practical 
standpoint) only when dealing with many questions and large 
numbers of subjects. 

8) The data derived from this study indicate that the act of 
reading several incorrect choices has little or no effect upon the 
ability of the subject to select the correct choice in the list. 








THE CLINICAL SIGNIFICANCE OF IQ’S | 
ON THE REVISED STANFORD-BINET SCALE 


G. W. PARKYN 


Education Laboratory and Child Guidance Clinic, University of Otago, 
Dunedin, New Zealand 


Since the publication of the revised Stanford-Binet scale in 1937 
several tables have appeared equating T-M IQ’s with the 1916 
Stanford-Binet IQ’s. Such attempts are important, for the 
nomenclature of the 8-B scale with reference to qualitatively 
different levels of intelligence and their proximate IQ limits, is in 
use everywhere. 

Bernreuter and Carr! based such a table upon the assumptions 
—first, that the Standard Deviations of IQ’s on the two scales 
were 12 points and 16 points, respectively, and, second, that both 
distributions conformed closely to the normal curve. Examples 
of equivalents given in their table, at the lower levels are: 


-18D T-M IQ 84 S-B IQ 88 
—1.25 SD “ 80 “85 
—1.62 SD “ 74 “gO 
—2SD “ 68 oe 
—2.50 SD “ 60 oe 
-—3 SD “ 52 “64 
-—4S8D “ 36 “59 
—4.12 SD “ 34 “50 


Such a result was very disturbing to clinic workers used to think- 
ing of S-B IQ’s 85, 70, and 50 as convenient, though arbitrary, 
midpoints of borderline areas between the normal, the dull, the 
high-grade defectives, and the low-grade defectives, respectively. 

Statistical refinements followed, such as those of Davis,? who 
compiled elaborate tables of equivalents differing for different age 
levels, to correspond with the varying SD’s reported by Terman 
and Merrill* for different age levels in their standardization group. 

In spite of the apparent need for clinic workers to form radi- 
cally different ideas on the IQ limits of defect and so on, many, no 
doubt, found, as I have, that the children testing below 70 on the 
revised scale did not appear to be markedly different in status 


from those previously found to be below 70 on the old S-B scale. 
114 











Clinical Significance of IQ’s on Revised B-S Scale 115 


When statistical procedures of this nature fail to give results 
which conform to the empirical findings of clinic workers, the 
basic assumptions underlying the procedures need examining. 
In both cases quoted it is certain that there is a serious error 
involved in using SD 12 points for the S-B scale. A statement 
that the SD of a test is, say, 12 points, means that on a sampling 
truly representative of the total population on which the test is 
meant to be used, the SDis 12. Other scales administered to the 
same population might give other SD’s, so in a sense the SD is a 
function of the test; but the same test administered to less repre- 
sentative samplings would give different SD’s also, so in a sense 
too, the SD is a function of the population tested. When we are 
concerned to equate the two scales of IQ’s through their SD’s, 
these must be based upon truly representative samplings. This 
could most surely be done if the same representative sampling 
were used for the standardization of both tests. Failing this, we 
must be assured that the SD’s used are derived from equally 
representative samplings. 

It is quite certain now that the nine hundred five cases on which 
the 1916 S-B scale was standardized formed a group the vari- 
ability of which was less than that of the population it sampled. 
No doubt 12 points was the SD of that group, but all the evidence 
amassed since then shows that on larger and more representative 
samplings of British and American populations the SD is much 
greater than 12 points. 

Terman himself in his study of the one thousand children of IQ 
140+ out of a population of 200,000 long ago said that such IQ’s 
occurred with a frequency of about one intwo hundred. This is 
consistent with an SD of 15 points, not 12. He said too that on 
a large sample the SD might be between 15 and 18 points.‘ 
Heilman’s® study of 828 ten-year-olds gave an SD of 15.2 points, 
and the Harvard growth survey of 1241 Boston children gave 
14.82 points. Krugman’s® study of 1361 children in Grades I, II 
and III in four New York schools gave an SD of 13.7 points—the 
upper half of the distribution being consistent with an SD of 15 
points, the lower half being curtailed by the exclusion of many 
defectives. The conclusive evidence is that of the Scottish 
survey of 1935-37, in which every child born in Scotland on 
February 1, May 1, August 1 and November 1 in 1926, was 






































116 The Journal of Educational Psychology 


tested.’ For these eight hundred seventy-four children the SD 
was 15.58 points. 

If, then, for comparative purposes, an absolute SD is required 
for the S-B scale, the figure should be at least 15 points. Com- 
parative tables should use this value rather than the usual 12 
points. This would have the result of lowering the predicted 
discrepancies at the lower end of the curve by an appreciable 
amount, thus:— 


T-M IQ S-B IQ 

SD units SD = 16 SD = 16 
—1.00 84 85 
—1.25 80 81 
—2.00 68 70 
~—2.50 60 62 
—3.00 52 55 
—§:38 47 50 
—4.00 36 40 
—4.12 34 38 


The evidence supplied by published results of widespread 
retests of children with the revised scale after originally being 
tested with the S-B scale show differences which, in fact, conform 
closely to those of the above table in which the S-B IQ was 
assumed to be 15 points. Merrill showed that on the T-M 
standardization group of 2904 children, the direction of IQ change 
(as indicated by a comparison of the IQ equivalents of the dif- 
ferent Percentile Ranks for each scale) was similar to that pre- 
dicted by Bernreuter and Carr, but that the magnitude of the 
change was much less for IQ’s below average. It is noteworthy 
that in the suggested nomenclature for the different levels of 
intelligence which she gives in Table VIII Merrill retains IQ 70 
as the conventional boundary of mental defect, in spite of Bern- 
reuter and Carr’s earlier article. That stand taken upon the 
empirical evidence has, I think, been justified. 

Hoakley’s study* of three hundred fifty defectives in the Wayne 
County Training School, showed that the range of IQ’s for the 
two tests was almost identical, being 33-109 on the revised and 
33-101 on the S-B scale. In both cases the medians were IQ 68. 
For the under fourteens, the median difference in IQ (T-M minus 
S-B) was —1 point. 




















Clinical Significance of IQ’s on Revised B-S Scale 117 


Now it will be noted that the observed differences at the lower 
IQ levels are still less than those predicted on statistical grounds, 
while at the upper levels they are higher. The usual reason 
given for this is based upon the known inadequacies of the S-B 
at the older levels. Another reason would seem to be found in 
the second assumption on which statistical equivalences have 
been based; namely, that the two scales give IQ’s exactly con- 
forming to the normal curve. The 2904 cases of the T-M revi- 
sion are, however, skewed slightly to the upper end of the scale, 
and the effect of this is that the SD of a symmetrical curve based 
upon the upper half of the actual distribution would be larger 
than the SD of a symmetrical curve based upon the lower half. 
Hence, the SD of the revised scale may be regarded for practical 
purposes with low IQ’s as less than that for the scale as a whole. 
This would further reduce expected discrepancies in low [Q’s to 
conform more closely with the actual findings. 

A recent research has been carried out by a graduate’ student 
in the University of Otago to find the most suitable point at which 
an arbitrary line can be drawn between the so-called ineducable 
defectives (idiots and imbeciles) and the educable defectives 
(morons) on the revised T-M scale. A careful two-year clinical 
study was made of almost all children in the city of Dunedin 
(pop. 82,000) whose 1Q’s were between 40 and 60 approximately, 
involving appraisal of their intellectual abilities, scholastic 
attainments, their social development and their emotional charac- 
teristics. ‘Then in each case a decision was made on the best 
educational provision for the child,—whether he should go to the 
Occupation Centre (a special school for low-grade defectives) or 
to one of the Special Classes (for backward children and high- 
grade defectives) attached to the ordinary schools. The area on 
the revised scale was then found in which most difficulty was 
encountered in making the critical decision. This area centred 
round IQ 50. 

No doubt by now many other clinic workers will have found as 
we have, that for children testing under IQ 80 on the revised scale 
much the same lines of demarcation of the different levels of 
intelligence may be used as with the 1916 scale. 


REFERENCES 


1) Bernreuter, R. G. and Carr, E. J. “‘The Significance of 1Q’s on 
the T-M Stanford-Binet” Jnl. Ed. Psych., 1938, April, No. 4, pp. 312-314. 








118 The Journal of Educational Psychology 


2) Davis, F. B. ‘The Interpretation of IQ’s Derived from the 1937 
Revision of the Stanford-Binet Scales.”” Jnl. Appl. Psych. 1940, pp. 
595-604. 

3) Terman, L. M. and Merrill, Maud A. Measuring Intelligence 
1937, p. 40. 

4) Terman, L. M. et al. Genetic Studies of Genius Vol. I, p. 633. 

5) Heilman, J.D. Jnl. Ed. Psych., Jan. 1933 quoted by Redmond 
and Davies, The Standardization of Two Intelligence Tests. New Zea- 
land, 1940, p. 37. 

6) Krugman, M. ‘Some Impressions of the Revised Stanford-Binet 
Scale” Jnl. Ed. Psych. 1939, Nov., No. 8, pp. 594-603. 

7) Macmeeken, A. M. The Intelligence of a Representative Group of 
Scottish Children, London, 1939. 

8) Hoakley, Z. Pauline. ‘‘A Comparison of the Results of the Stan- 
ford and Terman-Merrill Revisions of the Binet.” Jnl. Appl. Psych. 


1940, Vol. xxrv, pp. 75-81. 
9) de Lautour, Patricia M. The Borderline Child. Unpublished 


thesis prepared for M.A. in Education 1944, University of Otago, 
Dunedin, New Zealand. 








RELATIONSHIP BETWEEN 
THE GOODENOUGH DRAWING A MAN TEST 
AND THE 1937 REVISION 
OF THE STANFORD-BINET TEST 


GELOLO McHUGH 
Barnard College 


The data for this report are taken from previous publications*4 
in which changes in Binet and Goodenough IQ’s at the Public 
School Kindergarten Level have been considered. The data 
used are Goodenough and Binet scores earned by ninety kinder- 
garten children at the second administration of these tests which 
occurred after the subjects had from one to three months of school 
experience. 

A survey of the literature was made to determine whether any 
previous reports on the relationship between these two tests are 
available. A careful search disclosed twenty-eight articles and 
books dealing with various aspects of child development as 
reflected in children’s drawings published since 1937, but none of 
these deal with relationship between Goodenough and Binet 
tests. A search was not made of the literature prior to 1937, 
which is the publication year of the Revised Stanford-Binet. 
Reference to the relationship between the Goodenough test and 
the 1916 Stanford-Binet will be made in the discussion of results. 

Since the Goodenough test and the Stanford-Binet are well 
known instruments which have had much use in published 
research, no effort will be made to describe them. Bibliography 
references (2) and (5) give complete details as to their standard- 
ization, norms, administration and uses. 


ADMINISTRATION AND SCORING OF THE TESTS 


The two tests were administered to individual subjects under 
standard conditions by trained examiners of long experience. In 
every instance the subjects were retested by the same examiner 
who had administered initial tests during the two weeks prior to 
the beginning of school. The mean school attendance at the 
time of the second test was 30.2, SD 12.2 school days. (? p. 16, 
Table 8.) The mean CA of the subjects at the time of this test 
was 64, SD 3.97 months. (* p. 11, Table 2.) The sexes were 
approximately equally represented by forty-three boys and 
forty-seven girls. All tests were administered without knowledge 

119 



































120 The Journal of Educational Psychology 


of previous scores since no tests were scored until all data were 
secured. (#p.9.) In the administration of the Stanford-Binet 
test forms L and M were used for equal numbers of subjects. 
Both examiners scored all Binet tests. (* p. 9) 

The instructions given by Goodenough (? p. 85) for group 
administration of the Drawing a Man Test had to be slightly 
altered in individual administration. The test was adminis- 
tered to each subject twice in succession at the end of the Binet 
testing period. In presenting the test materials (pencil and 
paper) to the subject for the first trial the examiner said, “‘ And 
now, I want you to make a picture of a man. Take your time 
and be careful. Make the very best picture of a man that you 
can.”’ At the second trial, which came after the subject’s first 
effort had been numbered and removed from sight, the examiner 
said, ‘‘I want you to make just one more good picture of a man 
and then we will be through working.”’ In the final assembly of 
scores for this report each subject has been credited with what- 
ever was the best score of these two trials. All Goodenough 
drawings were scored by one examiner and careful checks have 
been made as to the accuracy of these scores by having the draw- 
ings re-scored by others who were without knowledge of the 
examiner’s scores. Details of these checks have been reported 
elsewhere.‘ 


RESULTS 


1) Goodenough MA and Binet MA: A positive r of .45, PE .06 
has been obtained between Goodenough MA and Binet MA for 
ninety subjects. This r does not compare favorably with Good- 
enough’s r of .70 between Goodenough MA and 1916 Binet MA 
for ninety-four children five years of age. (? p. 50, Table 9.) 
The r reported here probably is somewhat depressed by the fact 
that the two forms of the 1937 Revision of the Binet test were 
used. The author has in progress a study of relationship between 
form L of the Binet test and the Goodenough test which promises 
to show a relationship approaching that of Goodenough above. 

2) Goodenough IQ and Binet IQ: A positive r of .41, PE .06, has 
been obtained between the Goodenough IQ and Binet IQ of these 
subjects. Goodenough does not report on the relationship 
between Goodenough IQ and Binet IQ for specific age levels, 
but shows (? p. 51, Table 10) a positive r of .74 between “‘ Draw- 
ings IQ and Stanford-Binet IQ for ages four to ten years with 





Relationship between Goodenough and B-S Tesis 121 


three hundred thirty-four subjects.” Again, the r of .41 reported 
here may be depressed because of the use of two forms of the 
Binet test. 

3. Goodenough Scoring Items and Binet IQ: Bi-serial correla- 
tions (' pp. 366-371) have been computed for the relationship 
between each of the fifty-one Class B scoring items of the Good- 
enough test (* p. 214) and Binet 1Q. Positive bi-serial r’s rang- 
ing from .01 to .54 were obtained for thirty of these items with 
the remaining twenty-one yielding zero or slight negative rela- 
tionships. Trable 1 identifies the thirty items which have posi- 
tive bi-serial relationship with Stanford-Binet IQ and shows the 
degree of relationship for each. 

From Table 1 it is seen that only thirty of the fifty-one Good- 
enough scoring items contribute to the positive relationship 
between Goodenough and Binet scores of ninety public school 
kindergarten subjects. If the Binet test scores are accepted as 
criteria of true 1Q’s of these subjects, then the number of scoring 
items of the Goodenough test may be reduced at this age level 
by a minimum of slightly more than forty per cent with no loss 
in test validity. Table 2 shows that even greater reduction in 
the number of scoring items used at this age level is possible with 
gains rather than losses in the validity of the Goodenough test 
against the Binet test as a criterion. 

In Table 2 it is seen that when all Goodenough tests are 
rescored in terms of items which yield positive bi-serial r’s with 
Binet IQ, there is a gradual increase in the validity of the Good- 
enough test against the Binet test as a criterion with the elimina- 
tion from the scoring of the Goodenough of items which yield the 
lower positive bi-serial r’s. The greatest validity of the Good- 
enough test is obtained when it is scored on nine items which 
correlate .30 bis. or better with Binet 1Q. It is believed that this 
correlation between the Goodenough test and the Binet test will 
be considerably improved when the scores for the Binet are 
derived from one form of the test. It is recognized that the 
correlations reported in Table 2 cannot be finally accepted as 
valid until similar results are obtained on a completely new set 
of data in which the same subjects are not also used to obtain 
bi-serial relationships between Goodenough test items and Binet 
test scores. 

Goodenough (? p. 48) has reported a reliability coefficient of .93 
for her test on test re-test scores of one hundred ninety-four 





E 
1s 
. 
1 
; 














122 





The Journal of Educational Psychology 


TABLE 1.—PosiITIVE BI-SERIAL RELATIONSHIP BETWEEN THIRTY 
GOopDENOUGH ScorRING ITEMS AND 1937 STANFORD-BINET 


Good- r bis. 

enough with 

Items Binet IQ 
2 .48 
3 25 
4a . 20 
4b .19 
4c .33 
5a .23 
6a .04 
6b .21 
7a . 26 
7b . 22 
7c 47 
7d .25 
Ze ,33 
8a .14 
9a .40 
10a .29 
10b 31 
10c . 28 
10d .01 
10e .16 
llb 35 
12a .27 
12c .36 
12e . 54 
13 .35 
16a .19 
16b .23 
16c .25 
17a 17 
18a 17 


first-grade children. 


IQ 


Description of Goodenough Items 
Legs present 
Arms present 
Trunk present 
Length of trunk greater than breadth 
Shoulders indicated 
Both arms and legs attached to trunk 
Neck present 
Outline of neck continuous with that of head 
or trunk 
Eyes present 
Nose present 
Mouth present 
Both nose and mouth shown in two dimen- 
sions; two lips shown 
Nostrils indicated 
Hair shown 
Clothing present 
Fingers shown 
Correct number of fingers shown 
Fingers shown in two dimensions 
Opposition of thumb shown 
Hand shown as distinct from fingers or arms 
Leg joint shown, either knee, hip, or both 
Head in proportion 
Legs in proportion 
Both arms and legs shown in two dimensions 
Heel shown 
Eye detail. Brow or lashes shown 
Eye detail. Pupil shown 
Eye detail. Proportion 
Both chin and forehead shown 
Profile with not more than one error 


The author has reported r’s of .91 and .86 


between consecutive trials on the Goodenough test for groups of 
eighty-three and ninety public school kindergarten children.‘ 





Relationship between Goodenough and B-S Tests 123 


When the drawings used in the present report were re-scored in 
terms of groups of items selected according to size of bi-serial r’s 
with Binet IQ, it was decided to correlate the new scores in each 
case with the total score obtained when the drawings were scored 


TABLE 2.—r BETWEEN ScORINGS OF GOODENOUGH TEST WITH 
IrTeEMs OF VARIOUS DEGREES OF POSITIVE BI-SERIAL CoR- 
RELATION TO Binet IQ anp Binet MA, anv IQ 
CG. 1 fF Be a a ee 
and and and and and and and and 
over Over Over Over over Over Over over 


No. of items.... 30 28 26 21 ~= 16 i) 7 4 
r’s with 1937 Binet 

Ry PPR 47 .47 .48 .80 .51 .58 .63 .5O0 
r’s with 1937 Binet 

BER cccccts cece sae VOR) 4B GR 2 ae a ee 


on Goodenough’s fifty-one items. It is recognized that the r’s 
reported for these in Table 3 are not measures of the reliability of 
the revised scoring of the Goodenough test, but their size indi- 
cates whether or not there may be appreciable losses in the reli- 
ability of the Goodenough test when it is scored on fewer items 
which increase its validity against the Binet test as a criterion. 


TABLE 3.—1r’s BETWEEN GOODENOUGH TESTS RECORDED ON 
IremMs WuicH CORRELATE POSITIVELY .01 AND ABOVE 
wiTH Binet IQ AND SCORES ON THE SAME 

GoopENOUGH TrEsTs SCORED ON Goop- 
ENOUGH’S FIFTY-ONE ScorinG ITEMS 

01 .10 .15 .D .25 .20 .36 .@ 

and and and and and and and and 

over over Over Over over Over Over over 


my ee 30 28 26 21 = 16 9 7 4 
r’s with Good- 


enough score on 
8 SS .99 .99 .98 .94 .89 .79 .75 .70 


In Table 3 it is seen that there is no loss in the reliability of the 
Goodenough test for these subjects when the test is scored on 
items which correlate .10,.. or better with Binet IQ. It further 
appears that no appreciable losses in reliability occur until the 
revised scoring is limited to nine items which correlate .30,.. or 











Se 


_— 











ee re 


oe en oe 


a ea wea 





124 The Journal of Educational Psychology 


better with Binet IQ. It is believed that even this r of .79 may 
be increased when the bi-serial r’s are recalculated to determine 
the relationship between individual items of the Goodenough test 
and Binet MA, form L, rather than IQ’s on equal numbers of 


form L and M. 
SUMMARY AND CONCLUSIONS 


The relationship between the Goodenough Drawing a Man 
Test and the 1937 Revision (forms L. and M.) of the Stanford- 
Binet Test scores of ninety public school kindergarten children 
with a mean CA of 64, SD 3.97 months have been studied. A 
significant r of .45, PE .06 between the MA scores of the two tests 
and an r of .41, PE .06 between the IQ scores have been demon- 
strated. These r’s probably are somewhat lower than will be 
obtained with all subjects tested on one form of the Binet test. 

Bi-serial r’s between individual scoring items of the Good- 
enough test and Binet IQ of the same subjects indicate that only 
thirty of the fifty-one Goodenough scoring items contribute to 
the positive relationship between the two tests. It is further 
demonstrated that the relationship between the two tests is 
improved by limiting the scores on the Goodenough test to items 
which show the higher bi-serial r’s with Binet 1Q, and that the 
best relationship’ is obtained when the scoring of the Goodenough 
test is limited to nine items which correlate .30,;. or better with 
Binet IQ. 

r’s between scores for the Goodenough test which are limited to 
items which correlate positively with Binet IQ and scores on the 
Goodenough test using all of Goodenough’s fifty-one items indi- 
cate that the number of items used for scoring this test, when it is 
used with kindergarten children, could be greatly reduced with- 
out an appreciable loss in reliability. 


BIBLIOGRAPHY 


1) Garrett, H. E., Statistics in psychology and education. New York: 
Longmans Green, 1939. 

2) Goodenough, F. L., Measurement of intelligence by drawings. 
Yonkers, N.Y.: World Book Company, 1926. 

3) McHugh, G., “Changes in IQ at the public school kindergarten 
level.” Psychol. Monog. $250, 1943. 

4) McHugh, G., Changes in Goodenough IQ at the public school 
kindergarten level. Jour. Ed. Psy. xxxvi: 1, January, 1945, pp. 17-30. 

5) Terman, L. M. and Merrill, M. A., Measuring intelligence. Bos- 
ton: Houghton Mifflin, 1937. 





BOOK REVIEWS 


JoHN R. Beery. Current Conceptions of Democracy. New 
York: Bureau of Publications, Teachers College, Columbia 
University, Contributions to Education, No. 888, 1943, 
pp. 109. 


This study takes the view that democracy means what the 
mass of people think it means. The purpose is “‘to investigate 
currently held conceptions of democracy with a view of discover- 
ing the major points of agreement and disagreement.” 

Short statements about the meaning of democracy were taken 
from various sources and organized into questionnaire form. 
Each item was rated on a five-point scale. Six different groups 
were tested, 953 people in all. 

Analysis of results revealed a large body of democratic theory 
on which a sizable majority of the subjects were agreed. This 
included material concerned with ‘‘respect for the individual, 
equality, reliance on intelligence and rational methods, liberty, 
faith in the common man as a source of power, and duties and 
obligations of the democratic citizen.’’ The disagreements were 
concentrated in the economic area and in practical applications of 
general principles. There were some group differences in 
responses. Educational implications are listed. 

Certain limitations of the study are noted by the author. 
Since a representative sample was not tested, the results do not 
necessarily reflect the views of the country as a whole. This 
statement should have been more positive. The views obtained 
cannot be representative of the total population. Furthermore, 
the author’s conclusion that the responses from any one group 
are representative of the views of that group may be seriously 
questioned. When only thirty-nine, twenty-seven, fifty-nine, 
thirty-eight and forty-two per cent of questionnaires were 
returned, there is no way of knowing what selective factors were 
operating in determining who was to return the questionnaire. 
Certainly there is little chance of the obtained sample being 
representative of the group. The fact that increasing the num- 
ber of subjects (like those responding) would make little change 
in the results, has little bearing on the adequacy of the sample. 
While the technique developed by the author has considerable 
promise, it is advisable in future work to obtain a representative 

125 





—— et 


ee IT mrt 


— 


a ee Ee 


ee "ee eee 


ae = i? Sspema ee Sepa ase 














126 The Journal of Educational Psychology 


sample either in dealing with the general population or with 
specific groups. The questionnaire method of collecting data 
appears unsatisfactory in this kind of study. 
Mixes A. TINKER 
University of Minnesota 


Sirvan S. Tomkins, Editor. Contemporary Psychopathology. 
Cambridge, Mass.: Harvard University Press, 1943, pp. 600. 


Contemporary Psychopathology, edited by Silvan 8. Tomkins, 
is intended to serve as a source book for supplementary reading 
in abnormal psychology courses. Editing this volume involved 
selecting and arranging forty-five articles by fifty-four con- 
tributors, writing a one-page preface, and having Henry A. 
Murray write a two-page introduction. The volume includes a 
large variety of articles which describe the currents of research 
in physiology, medicine, psychiatry, and sociology. Most of 
the contributors are psychiatrists and psychologists. All the 
contributions contained are presented under four large divisions, 
as follows: Mental Disease in Childhood, Psychoneuroses and 
Psychosomatic Medicine, The Schizophrenic Psychoses, and 
Experimental Psychopathology. All in all, the book includes a 
fairly good selection of illustrative and interpretative material 
for modern psychopathology. Not all the articles have been 
recently written. Some of Franz Alexander’s work goes back 
to 1934, for example, and other contributions date back to 1938 
and 1939. 

Murray in his preface to the book praises the volume because 
the editor permits each contributor to speak his piece without 
fear of interruption. He gives the editor credit for being a self- 
effacing host. He compares the book to Taylor’s Readings in 
Abnormal Psychology, written in 1926. In this latter volume 
Taylor not only commented on but made an attempt to connect 
the various articles included, considered the significance of the 
articles included, introduced and interpreted them. Significant 
changes in psychopathology have taken place since that time. 
Changes noted by Murray in comparing the two volumes are: 
(1) a shift in emphasis from the descriptive and philosophical 
towards a more dynamic consideration of behavior, (2) an increas- 
ing spread of psychoanalytic influence, and (3) an increase in the 
use of experimental procedures. Not included in this volume 











Book Reviews 127 


are some contributions from the field of cultural anthropology or 
semantics. 

Since Murray compared this book with Taylor’s Readings in 
Abnormal Psychology, it might be interesting also at the present 
time to compare it with the two volumes recently edited by 
J. McV. Hunt. In Hunt’s two volumes on Personality and 
Behavior Disorders there are thirty-five contributors who wrote 
special articles. In these volumes there is an attempt at codrdi- 
nation and integration which does not appear in the source book 
under review: Most of the articles selected for inclusion in this 
source book are recent publications. The largest number of 
single articles taken from one source are from The Psychoanalytic 
Quarterly; nine articles are from this journal. Five more are 
taken from Psychosomatic Medicine. Four are from the Archives 
of Neurology and Psychiatry. The psychological journal which 
gets the largest number of selections is the Journal of Abnormal 
and Social Psychology from which four articles are taken. The 
other articles are selected from a large range of psychological and 
psychiatric journals. 

The editor says that this book is intended for relatively imma- 
ture students. To such students the selection of articles here 
can give at least a sampling notion of illustrative and interpreta- 
tive materials from modern contemporary psychopathology. 
A more adequate and codrdinative notion of the field such stu- 
dents can now obtain from Hunt’s two volumes. Personally, 
the reviewer thinks the volume would be more valuable if it did 
have some prefatory material and some annotations concerning 
the various contributions and their significance. As the volume 
stands there is no denying that for students who are not proficient 
enough or interested enough or who do not have time enough to 
go to the library and search out materials on their own, or for 
instructors who can give them enough of a bibliography to 
supplement textbooks that are used, this should serve as a 
excellent supplementary volume. Its chief advantage is that 
it is easy to get to and it does include enough of a selection that 
will serve as a good source of supplementary material. For a 
good course in abnormal psychology, however, it probably would 
be advisable for a more efficient instructor still to advise the 
students to use the library and to go back to the original source 
material, also to familiarize themselves with literature as it is 





128 The Journal of Educational Psychology 


growing and developing. For those who are too lazy this volume 
will be more of an asset than to those who are not. But it does 
contain enough of the significant articles showing some of the 
trends, at least, and should serve a useful purpose in a teaching 
institution. H. MELTZER 


Psychological Service Center, St. Louis, Missouri 

















m oot eo tl atl a ae oe 
. af 
rane: a woot? . iat ant = ee Se eT 






